Dataset - B2FIND

Expanding WordNet with Gloss and Polysemy Links for Evocation Strength Recogn...

Evocation — a phenomenon of sense associations going beyond standard (lexico)-semantic relations — is difficult to recognise for natural language processing systems. Machine...

Registers in the System of Semantic Relations in plWordNet

Lexicalised concepts are represented in wordnets by word-sense pairs. The strength of markedness is one of the factors which influence word use. Stylistically unmarked words are...

COSTRA 1.0: A Dataset of Complex Sentence Transformations

COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard...

Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0)

A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank –...

Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated...

Prague Dependency Treebank 3.5

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied...

Czech Legal Text Treebank 2.0

The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. These texts come from the legal domain and they are manually syntactically annotated. The...

TermFrame: Terms, definitions and semantic annotations for karstology

The resource contains several datasets containing domain-specific data in three languages, English, Slovenian and Croatian, which can be used for various knowledge extraction or...

ILSP Conceptual Dictionary of Modern Greek (ELEXIS)

ConceptNet-el (Εννοιολογικό Λεξικό της Νέας Ελληνικής ΙΕΛ). ConceptNet-el is a conceptual dictionary of Modern Greek that assumes the form of a linguistic ontology. It...

Slovene corpus for general relation extraction SloREL 1.0

The SloREL corpus contains annotations for training relation extraction models on Slovene documents. It contains documents from Slovene Wikipedia with annotated entities and...

Slovene corpus for general relation extraction SloREL 1.1

The SloREL corpus contains annotations for training relation extraction models on Slovene documents. It contains documents from Slovene Wikipedia with annotated entities and...

WUT Relations Between Sentences Corpus

WUT Relations Between Sentences Corpus contains 2827 pairs of related sentences. Relationships are derived from Cross-document Structure Theory (CST), which enables...

12 datasets found