-
Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data... -
Treebanks for Unified Taxonomy of Deep Syntactic Relations
The datasets described in Droganova, Kira, and Daniel Zeman. "Towards a Unified Taxonomy of Deep Syntactic Relations." Proceedings of the 2024 Joint International Conference on... -
Multilingual comparable corpora of parliamentary debates ParlaMint 5.0
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Multilingual comparable corpora of parliamentary debates ParlaMint 4.1
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Wortschatz
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences -
Digitized Press
Collection of different digitized mastheads in Catalan and Spanish, covering a time span from 1808 to 2008. The collection, which is kept in the Girona City Council Archive,... -
Deltacorpus
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger... -
Deep Universal Dependencies 2.4
Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional... -
Corpus Tècnic de l'IULA
domain specific corpus (Law, Economy, Computing, Medicine and Environment as well as a contrastive corpus from the press); EN 3.3 M tokens, SP 33 M tokens, CAT 19 M tokens;... -
Universal Dependencies 1.3
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
SVMTool
Generator of sequential taggers based on Support Vector Machines. -
Universal Dependencies 2.7
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Coreference in Universal Dependencies 1.3 (CorefUD 1.3)
CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version... -
Universal Dependencies 2.6
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
TeLeMaCo
A collection of pointers to teaching and learning materials on linguistics and linguistic tools, including quick starts, how-tos, technical documentation, short teaching modules... -
Banco de neologismos 2004-2007
Repository of neologisms (15.375 entries) -
Universal Dependencies 2.14
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
BUSCANEO
Tool for neologism extraction.
