Dataset - B2FIND

Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data...

Treebanks for Unified Taxonomy of Deep Syntactic Relations

The datasets described in Droganova, Kira, and Daniel Zeman. "Towards a Unified Taxonomy of Deep Syntactic Relations." Proceedings of the 2024 Joint International Conference on...

Multilingual comparable corpora of parliamentary debates ParlaMint 5.0

ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Linguistically annotated multilingual comparable corpora of parliamentary deb...

ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Multilingual comparable corpora of parliamentary debates ParlaMint 4.1

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Linguistically annotated multilingual comparable corpora of parliamentary deb...

ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...

Wortschatz

Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences

Digitized Press

Collection of different digitized mastheads in Catalan and Spanish, covering a time span from 1808 to 2008. The collection, which is kept in the Girona City Council Archive,...

Deltacorpus

Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger...

Deep Universal Dependencies 2.4

Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional...

Corpus Tècnic de l'IULA

domain specific corpus (Law, Economy, Computing, Medicine and Environment as well as a contrastive corpus from the press); EN 3.3 M tokens, SP 33 M tokens, CAT 19 M tokens;...

Universal Dependencies 1.3

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

SVMTool

Generator of sequential taggers based on Support Vector Machines.

Universal Dependencies 2.7

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Coreference in Universal Dependencies 1.3 (CorefUD 1.3)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Universal Dependencies 2.6

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

TeLeMaCo

A collection of pointers to teaching and learning materials on linguistics and linguistic tools, including quick starts, how-tos, technical documentation, short teaching modules...

Banco de neologismos 2004-2007

Repository of neologisms (15.375 entries)

Universal Dependencies 2.14

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

BUSCANEO

Tool for neologism extraction.

136 datasets found