Dataset - B2FIND

Coreference in Universal Dependencies 1.4 (CorefUD 1.4)

CorefUD is a collection of previously existing coreference-annotated datasets that have been converted to a unified annotation scheme. In its current version (1.4), CorefUD...

Human Label Variation in Coreference (Hlava Cor)

Human Label Variation in Coreference (Hlava COR) is a collection of commented multiple annotations (three annotators) of coreferential relations in Czech, i.e. the annotation of...

Tagset: meta-annotation of mention spans

This tagset provides labels to assign formal categories to mention spans produced in the process of coreference annotation. The labels have been developed for German and might...

Annotationsguidelines für die Evaluation automatischer Koreferenzannotation

Guidelines for manual evaluation of automatic coreference annotation of German language data with CorPipe (Straka 2023) Gefördert durch die Deutsche Forschungsgemeinschaft...

Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)

The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes...

Coreference in Universal Dependencies 1.3 (CorefUD 1.3)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Prague Discourse Treebank 2.0

PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.

DiscoMT 2016 Shared Task on Cross-lingual Pronoun Prediction

Files for the DiscoMT 2016 shared task on cross-lingual pronoun prediction

ParCorFull: A Parallel Corpus Annotated with Full Coreference

ParCorFull is a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual...

Coreference in Universal Dependencies 1.1 (CorefUD 1.1)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Coreference in Universal Dependencies 1.0 (CorefUD 1.0)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Coreference in Universal Dependencies 0.1 (CorefUD 0.1)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Coreference in Universal Dependencies 0.2 (CorefUD 0.2)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Prague Czech-English Dependency Treebank 2.0 Coref

The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended...

Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0)

A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank –...

DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction

Data used in the 2017 shared task on cross-lingual pronoun prediction.

Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated...

Coreference in Universal Dependencies 1.2 (CorefUD 1.2)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

PAWS

PAWS is a multi-lingual parallel treebank with coreference annotation. It consists of English texts from the Wall Street Journal translated into Czech, Russian and Polish. In...

Prague Dependency Treebank 3.5

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied...

24 datasets found