A twelve-language dataset for the study of event descriptions in expository and narrative texts

DOI

Written language allows us to communicate about events across time and space. The same event, however, can be described in a virtually infinite number of ways. At present, we do not fully understand the extent to which event descriptions differ across individual writers, discourse genres, and languages. Here, we introduce a twelve-language dataset in which 100 participants per language (Dutch, English, German, Hindi, Japanese, Lithuanian, Macedonian, Mandarin, Bosnian-Croatian-Serbian, Spanish, Swedish, Turkish) each described two events across two discourse genres (expository vs narrative) in a standardized online text elicitation study. The resulting dataset, comprising of 2,400 short texts written by 1,200 different participants, will allow for studying a wide variety of potential similarities and differences in how the same events and referents are described across individual writers, languages, and discourse genres. As such, it complements the use of multilingual parallel corpora in the language sciences and will advance our understanding of how writers around the globe turn the invisible mental models in their minds into visible words on paper.

Identifier
DOI https://doi.org/10.34894/PZSUSU
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/PZSUSU
Provenance
Creator Peeters, David ORCID logo
Publisher DataverseNL
Contributor TiU Dataverse Admins; Peeters, David; Tilburg University; DataverseNL
Publication Year 2026
Rights CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact TiU Dataverse Admins (Tilburg University); Peeters, David (Tilburg University)
Representation
Resource Type text data; Dataset
Format text/csv; application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; application/octet-stream; application/vnd.openxmlformats-officedocument.wordprocessingml.document; application/pdf; text/comma-separated-values
Size 176451; 88474; 3239; 76573; 200807; 94659; 4294; 56983; 187390; 87805; 4912; 58871; 373411; 1081793; 217510; 101626; 5682; 112473; 407445; 137978; 9267; 139669; 136571; 60243; 3566; 69885; 249066; 121436; 6710; 65867; 350860; 133287; 8945; 98937; 132744; 71939; 3046; 86371; 197994; 93848; 6090; 58217; 192024; 95392; 4840; 62080; 268759; 119338; 3935; 134525
Version 1.0
Discipline Humanities; Linguistics