The dataset contains the quantitative data used as input for the Principal Components Analysis conducted in the article "The many guises of productivity: a case-study of Spanish inchoative constructions".
The data originates from the Spanish Web Corpus (esTenTen18), accessed via Sketch Engine (Kilgariff & Renau 2013). Only the subcorpus for European Spanish Data was selected. After downloading, the samples were manually cleaned. In the dataset, maximally 500 tokens were retained per auxiliary.
The data were annotated for 'Subject', 'AUX', 'Filler', 'Person', 'Tense', 'LexicalTypeInf', SyntaxInf, 'Intercalation', 'Intentionality', and 'Abruptness', besides other criteria that are not taken into account for this study. For this analysis, only the variables auxiliary, abbreviated as 'AUX' and infintive, abbreviated as 'INF' are taken into account.
See data-specific sections below for more information about the variables.