-
Universal Dependencies 2.7
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Plaintext Wikipedia dump 2018
Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at... -
Multilingual corpus of literal occurrences of multiword expressions
The corpus contains sentences with idiomatic, literal and coincidental occurrences of verbal multiword expressions (VMWEs) in Basque, German, Greek, Polish and Portuguese. The... -
OmegaWiki
This dataset has no description
-
Universal Derivations v0.5
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent... -
Universal Dependencies 2.9
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Universal Dependencies 2.11
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
W2C – Web to Corpus – Corpora
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1 -
Deltacorpus 1.1
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger... -
Morpho-syntactically annotated corpora provided for the PARSEME Shared Task o...
This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal... -
It-Sr-NER
It-Sr-NER tool is a CLARIN compatible NER web service for parallel texts with case study on Italian and Serbian; it can be used for recognizing and classifying named entities in...