-
CMC training corpus Janes-Syn 1.0
Janes-Syn is a syntactically annotated corpus of Slovene tweets and is meant as a gold-standard training and testing dataset for syntactic annotation of Slovene... -
Languages in Migration
LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness... -
The IPI PAN Corpus
written, general, monolingual, synchronic; 250 million; XML (XCES), morphosyntactic, structural, metada -
Copenhagen Dependency Treebanks versions 1-3
Parallel treebanks with annotation of syntax, discourse, coreference, morphology, and semantics. Version 3 also includes the Danish Dependency Treebank (version 1) and the... -
Czesl - Universal Dependencies Release 0.5
Syntactic annotation of 1600 sentences from the Czesl-MAN corpus using the framework of Universal Dependencies 2.3