Dataset - B2FIND

Multilingual training dataset for CAP policy topic classification ParlaCAP-train

The multilingual training dataset for CAP policy topic classification ParlaCAP-train is a collection of parliamentary speeches in 29 European languages, automatically annotated...

Coreference in Universal Dependencies 1.4 (CorefUD 1.4)

CorefUD is a collection of previously existing coreference-annotated datasets that have been converted to a unified annotation scheme. In its current version (1.4), CorefUD...

Linguistically annotated multilingual comparable corpora of parliamentary deb...

ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Multilingual comparable corpora of parliamentary debates ParlaMint 5.0

ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Parallel sense-annotated corpus ELEXIS-WSD 1.3

ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.3 contains sentences for 10...

Einstellung zur Monarchie (Niederlande) Attitude to the Monarchy (Netherlands)

Einstellung der Niederländer zu den Deutschen und zur Monarchie. Themen: Einstellungen zur Monarchie sowie zu Deutschen; Befragunganläßlich der Verlobung von Prinzessin...

Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data...

Multilingual comparable corpora of parliamentary debates ParlaMint 4.1

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Linguistically annotated multilingual comparable corpora of parliamentary deb...

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Delftse Bijbel 1477

Digitised version of the Delftse Bijbel 1477

Wortschatz

Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences

Deltacorpus

Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger...

Deep Universal Dependencies 2.4

Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional...

IFA speech corpus

Spoken corpus containing speech of 4 male and 4 female speakers. 50,000 words segmented at phoneme level

Universal Dependencies 1.3

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Project Gutenberg

Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen...

Plant names in Dutch dialect (PLAND)

Plant names in Dutch dialect

Universal Dependencies 2.7

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

OpenTaal word list

Free Dutch word list, suitable for spell checkers etc. - see http://opentaal.org/english.php

Universal Dependencies 2.6

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

231 datasets found