Dataset - B2FIND

Uniform Meaning Representation 2.1 (Czech and Latin)

Czech and Latin UMR data, both manually annotated and programmatically converted from manually annotated tectogrammatical data.

Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)

The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes...

Universal Dependencies 2.7

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Coreference in Universal Dependencies 1.3 (CorefUD 1.3)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Universal Dependencies 2.6

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.14

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Prague Discourse Treebank 2.0

PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.

Universal Dependencies 2.8

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Extended Textual Coreference and Bridging Relations in PDT 2.0

Annotation of extended textual coreference and bridging relations in the Prague Dependency Treebank 2.0

Coreference in Universal Dependencies 1.1 (CorefUD 1.1)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Universal Dependencies 2.12

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Coreference in Universal Dependencies 1.0 (CorefUD 1.0)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Universal Dependencies 2.2

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

ELITR Minuting Corpus

ELITR Minuting Corpus consists of transcripts of meetings in Czech and English, their manually created summaries ("minutes") and manual alignments between the two. Czech...

Coreference in Universal Dependencies 0.1 (CorefUD 0.1)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Coreference in Universal Dependencies 0.2 (CorefUD 0.2)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Prague Czech-English Dependency Treebank 2.0 Coref

The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended...

Universal Dependencies 2.3

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.5

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0)

A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank –...

37 datasets found