Dataset description:
The General Regionally Annotated Corpus of Ukrainian (GRAC, Shvedova et al. 2017-2024, uacorpus.org) was consulted to collect data for further analysis concerning the distribution of Singular vs. Plural verb forms in the target bahato construction. GRAC is a Sketch Engine corpus of over 1.8 billion words, representing texts from over 30,000 authors created between 1816 and 2023. This corpus is designed to serve as source material for linguistic research on Standard Ukrainian. Our data was collected during the month of February 2024. We extracted and annotated 28,491 examples of the bahato construction.
An additional set of examples was collected from the Russian National Corpus (ruscorpora.ru) during the month of August 2024 to provide comparison with the Russian mnogo construction. For this purpose, 6,612 examples were extracted and annotated for word order and Singular vs. Plural verb agreement.
Both the Ukrainian and the Russian data are included in this dataset, along with the R scripts used to analyze this data.
Article abstract:
We reveal an ongoing language change in Ukrainian involving a construction with a subject comprised of the indefinite quantifier багато ‘many’ modifying a noun phrase in the Genitive Plural. Number agreement on the verb varies, allowing both Singular (in 69.1% of attestations) and Plural (in 30.9% of attestations). Based on statistical analysis of corpus data, we investigate the influence of the factors of year of creation, word order of subject and verb, and animacy of the subject on the choice of verb number. We find that, while all combinations of word order and animacy are robustly attested, VS word order and inanimate subjects tend to prefer Singular, whereas SV word order and animate subjects tend to prefer Plural. Since about the 1950s, the proportion of Plural has been increasing, overtaking Singular in the current decade. We propose that this Singular vs. Plural variation is motivated by the human embodied experience of construing a group of items as either a homogeneous mass (and therefore Singular) or a multiplicity of individuals (and therefore Plural). This proposal is supported by the identification of micro-constructions that prefer Singular and show reduced individuation of human beings.
R, 4.4.1 (2024-06-14) -- "Race for Your Life"