Replication Data for "Using Greenberg’s Universal 45 in Universal Dependencies: Gender Distinctions and Annotation Challenges"

DOI

This dataset contains the results of a computational operationalization of Greenberg's Universal 45 applied to the Universal Dependencies (UD) corpus (version 2.14). The data covers 339 treebanks across 186 languages. For each treebank and Universal POS tag (UPOS) category, the dataset records whether gender distinctions are present in singular and/or plural tokens, and whether this combination constitutes a violation of the implication universal. The dataset also includes aggregated counts of gender-marked tokens by treebank, language, and UPOS category.

Identifier
DOI https://doi.org/10.34810/DATA3130
Metadata Access https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/DATA3130
Provenance
Creator Brosa Rodríguez, Antoni ORCID logo
Publisher CORA.Repositori de Dades de Recerca
Contributor Brosa Rodríguez, Antoni; Universitat Rovira i Virgili
Publication Year 2026
Rights CC BY 4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Brosa Rodríguez, Antoni (Universitat Rovira i Virgili)
Representation
Resource Type Textual data; Dataset
Format text/tab-separated-values; text/plain
Size 290548; 440; 28662; 140502; 14003; 1198; 11648
Version 1.0
Discipline Humanities; Linguistics