Czech OOV Inflection Dataset

Dataset

PID

Czech OOV Inflection Dataset is a Czech inflection dataset of nouns, focused on evaluation in out-of-vocabulary (OOV) conditions. It consists of two parts: a standard lemma-disjoint train-dev-test split of a subset of noun paradigms of existing morphological dictionary Czech MorfFlex 2.0 (files train, dev and test-MorfFlex); and small set of neologisms from Čeština 2.0, annotated for inflected forms (file test-neologisms).

Identifier
PID	http://hdl.handle.net/11234/1-5471
Related Identifier	https://doi.org/10.48550/arXiv.2404.08974
Related Identifier	https://github.com/tomsouri/cz-inflect
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-5471

Provenance
Creator	Sourada, Tomáš
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year	2024
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech
Resource Type	lexicalConceptualResource
Format	application/x-xz; application/octet-stream; downloadable_files_count: 1
Discipline	Linguistics