Description of Dataset
This is a study of examples of Russian predicate adjectives in clauses with zero-copula present tense, where the adjective is a short form (SF) or a long form nominative (LF). The data was collected in 2022 from SynTagRus (https://universaldependencies.org/treebanks/ru_syntagrus/index.html), the syntactic subcorpus of the Russian National Corpus (https://ruscorpora.ru/new/).
The data merges the results of several searches conducted to extract examples of sentences with long form and short form adjectives in predicate position, as identified by the corpus. The examples were imported to a spreadsheet and annotated manually, based on the syntactic analyses given in the corpus. For present tense sentences with no copula (Река спокойна or Река спокойная), it was necessary to search for an adjective as the top (root) node in the syntactic structure. The syntactic and morphological categories used in the corpus are explained here: https://ruscorpora.ru/page/instruction-syntax/.
In order for the R code to run from these files, one needs to set up an R project with the data files in a folder named "data" and the R markdown files in a folder named "scripts".
Method: Logistic regression analysis of corpus data carried out in R (R version 4.2.3 (2023-03-15)-- "Shortstop Beagle" Copyright (C) 2023 The R Foundation for Statistical Computing) and documented in an .Rmd file.
Publication Abstract
The present article presents an empirical investigation of the choice between so-called long (e.g., prostoj ‘simple’) and short forms (e.g., prost ‘simple’) of predicate adjectives in Russian based on data from the syntactic subcorpus of the Russian National Corpus. The data under scrutiny suggest that short forms represent the dominant option for predicate adjectives. It is proposed that long forms are descriptions of thematic participants in sentences with no complement, while short forms may take complements and describe both participants (thematic and rhematic) and situations. Within the “space of competition” where both long and short forms are well attested, it is argued that the choice of form to some extent depends on subject type, gender/number, and frequency. On the methodological level, the approach adopted in the present study may be extended to other cases of competition in morphosyntax. It is suggested that one should first “peel off” contexts where (nearly) categorical rules are at work, before one undertakes a statistical analysis of the “space of competition”.
R, 4.2.3 (2023-03-15) - "Shortstop Beagle" Copyright (C) 2023 The R Foundation for Statistical Computing