A twelve-language dataset for the study of event descriptions in expository and narrative texts

Dataset

DOI

Written language allows us to communicate about events across time and space. The same event, however, can be described in a virtually infinite number of ways. At present, we do not fully understand the extent to which event descriptions differ across individual writers, discourse genres, and languages. Here, we introduce a twelve-language dataset in which 100 participants per language (Dutch, English, German, Hindi, Japanese, Lithuanian, Macedonian, Mandarin, Bosnian-Croatian-Serbian, Spanish, Swedish, Turkish) each described two events across two discourse genres (expository vs narrative) in a standardized online text elicitation study. The resulting dataset, comprising of 2,400 short texts written by 1,200 different participants, will allow for studying a wide variety of potential similarities and differences in how the same events and referents are described across individual writers, languages, and discourse genres. As such, it complements the use of multilingual parallel corpora in the language sciences and will advance our understanding of how writers around the globe turn the invisible mental models in their minds into visible words on paper.

Identifier
DOI	https://doi.org/10.34894/PZSUSU
Metadata Access	https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/PZSUSU

Provenance
Creator	Peeters, David
Publisher	DataverseNL
Contributor	TiU Dataverse Admins; Peeters, David; Tilburg University; DataverseNL
Publication Year	2026
Rights	CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess	true
Contact	TiU Dataverse Admins (Tilburg University); Peeters, David (Tilburg University)

Representation
Resource Type	text data; Dataset
Format	text/csv; application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; application/octet-stream; application/vnd.openxmlformats-officedocument.wordprocessingml.document; application/pdf; text/comma-separated-values
Size	176451; 88474; 3239; 76573; 200807; 94659; 4294; 56983; 187390; 87805; 4912; 58871; 373411; 1081793; 217510; 101626; 5682; 112473; 407445; 137978; 9267; 139669; 136571; 60243; 3566; 69885; 249066; 121436; 6710; 65867; 350860; 133287; 8945; 98937; 132744; 71939; 3046; 86371; 197994; 93848; 6090; 58217; 192024; 95392; 4840; 62080; 268759; 119338; 3935; 134525
Version	1.0
Discipline	Humanities; Linguistics