-
Diccionario de neologismos on line
Lexicographic resource containing 3.530 neologisms documented in press written in Spanish between 1989 and 2007. -
Universal Dependencies 2.8
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to... -
Universal Dependencies 1.0
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
VOLEM
Multilingual Verbal Lexicon: Catalan , spanish (connexion with French and Basc of other groups) -
Deep Universal Dependencies 2.5
Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional... -
Corpus bilingüe d’alternança de llengües (codeswitching)
8 interactive recordings of group dynamics. Bilingual speakers (L1 -> English; L1 -> Catalan/Spanish). -
HamleDT 2.0
HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a... -
Universal Segmentations 1.0 (UniSegments 1.0)
Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation... -
WMT 13 Test Set
We provide the Vietnamese version of the multi-lingual test set from WMT 2013 [1] competition. The Vietnamese version was manually translated from English. For completeness,... -
OmegaWiki
This dataset has no description
-
Universal Derivations v1.1
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent... -
SynSemClass 5.0
The SynSemClass synonym verb lexicon version 5.0 is a multilingual resource that enriches previous editions of this event-type ontology with a new language, Spanish. The... -
Multilingual Central Repository
Multilingual lexical database that follows the model proposed by the EuroWordNet project. The MCR integrates into the same EuroWordNet framework wordnets from five different... -
Universal Dependencies 2.4 Models for UDPipe (2019-05-31)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data... -
NameTag 3 Multilingual Model 250203
This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/). NameTag 3 is an open-source tool for both flat and nested named... -
CorPipe 23 multilingual CorefUD 1.2 model (corpipe23-corefud1.2-240906)
The corpipe23-corefud1.2-240906 is a mT5-large-based multilingual model for coreference resolution usable in CorPipe 23 https://github.com/ufal/crac2023-corpipe. It is released... -
C4Corpus (CC BY-SA part)
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly... -
Bwananet
Tool for querying the Technical Corpus of the Institut Universitari de Lingüística Aplicada. -
PALIC
A package of tools for the processing of the Corpus Tècnic in Catalan and Spanish. It includes a preprocessor, a PoSTagger and a linguistic disambiguator.
