-
Parallel sense-annotated corpus ELEXIS-WSD 1.3
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.3 contains sentences for 10... -
Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data... -
Multilingual comparable corpora of parliamentary debates ParlaMint 5.0
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Multilingual comparable corpora of parliamentary debates ParlaMint 4.1
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Wortschatz
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences -
Deltacorpus
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger... -
Deep Universal Dependencies 2.4
Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional... -
BulTreeBank Frequency List
100 000 most frequent Cyrillic tokens in the BulTreeBank text archive, UTF-16 list of token-frequency pairs -
Universal Dependencies 1.3
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Universal Dependencies 2.7
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Ocorrect Corpus
Written, synchronic, general, bilingual, text and image; 1 000 000 tokens Bulgarian2300 image files150 000 tokens Greman312 image files -
Universal Dependencies 2.6
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
C4Corpus (publicdomain part)
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly... -
Universal Dependencies 2.14
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Universal Dependencies 1.4
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Annotated corpora and tools of the PARSEME Shared Task on Automatic Identific...
The PARSEME shared task aims at identifying verbal MWEs in running texts. Verbal MWEs include idioms (let the cat out of the bag), light verb constructions (make a decision),... -
Universal Dependencies 2.8
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to...
