Verbs annotated for morphemic structure in Czech, English, German, Spanish

Dataset

PID

A sample of verb lemmas in four languages: Czech (19,030 lemmas), English (9,965 lemmas), German (27,224 lemmas), Spanish (11,888 lemmas). Each verb lemma is annotated for its morphemic structure (i.e., segmented into the prefiex(es), root(s), suffix(es) and ending(s) that the given lemma contains), classification of its root morph to a root morpheme where needed (to facilitate grouping of verbs with the same root morpheme), and its frequency of the verb in a 100 M corpus. Two versions are available for each language: one with a more coarse-grained segmentation, which captures the morphemic structure that is synchronically available, and a version with a more fine-grained segmentation, which also captures the word's etymology.

Identifier
PID	http://hdl.handle.net/11234/1-5824
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-5824

Provenance
Creator	Hledíková, Hana
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year	2024
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); http://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech; English; German; Spanish; Castilian
Resource Type	lexicalConceptualResource
Format	application/octet-stream; downloadable_files_count: 8
Discipline	Linguistics