Replication Data for: A corpus-based analysis of the Dat-Nom/Nom-Dat alternation in German

DOI

Dataset abstract The dataset includes an annotated sample of N = 13292 German written sentences with a Nominative and a Dative argument. The sentences comprise 76 different verbs taking two alternating object orders: 5591 sentences occur with the Dat-Nom order, 8701 sentences occur with the Nom-Dat order. Each sentence is annotated for Object order, the sentence Verb and several features related to both objects, including: (pro)nominality, pronoun type, referentiality, person, number, definiteness, animacy, and length. The sentences and the two objects are shared in a separate .csv-file. An R Notebook with the data analysis is provided as well as an html file with both the R code and output for the analysis.

Article abstract A subgroup of German Nom-Dat verbs have received considerable attention in the literature due to the propensity of the dative to occur preverbally, which is unexpected on an object analysis of the dative (see references below). Here we argue for an alternative analysis, namely that the relevant verbs alternate between two different argument structures, Dat-Nom and Nom-Dat, and hence that either argument, the dative or the nominative, may be the syntactic subject. Earlier studies have shown that topicalisation of direct arguments is found in ca. 4–12% of the cases in German texts. For comparison, we have extracted 13,000 tokens of 76 verbs from the deTenTen13 corpus and coded them for ten different variables. Our findings support an alternating Dat-Nom/Nom-Dat analysis for these verbs, as 42% of the tokens instantiate the Dat-Nom order and the remnant 58% instantiate the Nom-Dat order. In contexts with full NPs only, the share of Dat-Nom tokens is even higher, 46% compared to 54% Nom-Dat, which altogether excludes a topicalisation analysis of the Dat-Nom word order. In order to throw further light on the alternation, we carry out a multivariate analysis which confirms the effect of topicality, definiteness, length, the animacy of the dative and the inanimacy of the nominative.

MS Excel, 2309

R, 4.2.3

RStudio (Posit Software, PBC), 2023.09.0

SketchEngine, 2023

Identifier
DOI https://doi.org/10.18710/CRSJLY
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/CRSJLY
Provenance
Creator Somers, Joren ORCID logo; Leuschner, Torsten ORCID logo; De Cuypere, Ludovic ORCID logo; Barðdal, Jóhanna ORCID logo
Publisher DataverseNO
Contributor De Cuypere, Ludovic; Ghent University; Vrije Universiteit Brussel; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year 2025
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact De Cuypere, Ludovic (Vrije Universiteit Brussel - Ghent University)
Representation
Resource Type annotated corpus data; Dataset
Format text/plain; application/octet-stream; text/html; text/comma-separated-values
Size 20680; 30308; 1144482; 263259; 4474225
Version 1.0
Discipline Humanities; Linguistics
Spatial Coverage Ghent, Belgium