-
Parallel sense-annotated corpus ELEXIS-WSD 1.3
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.3 contains sentences for 10... -
Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data... -
LegISTyr test set
LegISTyr is a machine translation test set for evaluating the quality of legal terminology translation from Italian to South Tyrolean German, a minor standard variety of German.... -
StarwarsNER French Italian Corpus - sample
The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It... -
StarwarsNER French Italian Corpus - sample
The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It... -
KIParla - KIPasti transcripts
The KIPasti corpus is part of the larger KIParla collection (www.kiparla.it), which can be freely queried through the NoSketch Engine interface. The ParlaBO corpus was compiled... -
KIParla - ParlaTO transcripts
The ParlaTO corpus is part of the larger KIParla collection (www.kiparla.it), which can be freely queried through the NoSketch Engine interface. The ParlaTO corpus was was... -
KIParla - ParlaBO transcripts
The ParlaBO corpus is part of the larger KIParla collection, which can be freely queried through the NoSketch Engine interface. The ParlaBO corpus was compiled within the... -
KIParla - KIP transcripts
The KIP corpus is part of the larger KIParla collection (www.kiparla.it), which can be freely queried through the NoSketch Engine interface. The KIP corpus was compiled within... -
MIXPAR Database: Version 1.0 (September 2025)
MIXPAR: A Database of Mixed Perfective Auxiliation in Italo-Romance (v1.0). This is the first public release (v1.0) of the MIXPAR database, a large-scale dataset documenting... -
Multilingual comparable corpora of parliamentary debates ParlaMint 5.0
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Multilingual comparable corpora of parliamentary debates ParlaMint 4.1
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Wortschatz
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences -
Deltacorpus
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger... -
Deep Universal Dependencies 2.4
Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional... -
Copenhagen Dependency Treebanks versions 1-3
Parallel treebanks with annotation of syntax, discourse, coreference, morphology, and semantics. Version 3 also includes the Danish Dependency Treebank (version 1) and the... -
Multilingual static embeddings for Verbal Multiword Expressions trained on PA...
This resource is a set of 14 vector spaces for single words and Verbal Multiword Expressions (VMWEs) in different languages (German, Greek, Basque, French, Irish, Hebrew, Hindi,... -
The National Certificates corpus
The NC test results, background information, speaking and writing performances in 9 foreign / second languages. A web-based data base (html files).
