-
Multilingual training dataset for CAP policy topic classification ParlaCAP-train
The multilingual training dataset for CAP policy topic classification ParlaCAP-train is a collection of parliamentary speeches in 29 European languages, automatically annotated... -
Coreference in Universal Dependencies 1.4 (CorefUD 1.4)
CorefUD is a collection of previously existing coreference-annotated datasets that have been converted to a unified annotation scheme. In its current version (1.4), CorefUD... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Multilingual comparable corpora of parliamentary debates ParlaMint 5.0
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Parallel sense-annotated corpus ELEXIS-WSD 1.3
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.3 contains sentences for 10... -
Einstellung zur Monarchie (Niederlande) Attitude to the Monarchy (Netherlands)
Einstellung der Niederländer zu den Deutschen und zur Monarchie. Themen: Einstellungen zur Monarchie sowie zu Deutschen; Befragunganläßlich der Verlobung von Prinzessin... -
Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data... -
Multilingual comparable corpora of parliamentary debates ParlaMint 4.1
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Delftse Bijbel 1477
Digitised version of the Delftse Bijbel 1477 -
Wortschatz
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences -
Deltacorpus
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger... -
Deep Universal Dependencies 2.4
Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional... -
IFA speech corpus
Spoken corpus containing speech of 4 male and 4 female speakers. 50,000 words segmented at phoneme level -
Universal Dependencies 1.3
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Project Gutenberg
Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen... -
Plant names in Dutch dialect (PLAND)
Plant names in Dutch dialect -
Universal Dependencies 2.7
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
OpenTaal word list
Free Dutch word list, suitable for spell checkers etc. - see http://opentaal.org/english.php -
Universal Dependencies 2.6
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
