Lexical Substitution Dataset for German.

This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia,with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators and the remaining sentences by 1 professional annotator and 5 additional annotators who have been recruited via crowdsourcing. The resulting dataset can be used to evaluate not only lexical substitution systems, but also different sense inventories and word sense disambiguation systems.

Identifier
Source https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2436
Related Identifier IsSupplementTo http://www.lrec-conf.org/proceedings/lrec2014/pdf/545_Paper
Metadata Access https://tudatalib.ulb.tu-darmstadt.de/server/oai/openairedata?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:tudatalib.ulb.tu-darmstadt.de:tudatalib/2436
Provenance
Creator Cholakov, Kostadin; Biemann, Chris ORCID logo; Eckle-Kohler, Judith; Gurevych, Iryna
Publisher Technische Universität Darmstadt
Publication Year 2020
Rights CC BY-SA 3.0; info:eu-repo/semantics/openAccess; https://creativecommons.org/licenses/by-sa/3.0/
OpenAccess true
Contact https://tudatalib.ulb.tu-darmstadt.de/docs/en/kontakt/
Representation
Resource Type Dataset
Format application/octet-stream
Size 360.87 KB
Discipline Other