Dataset - B2FIND

Galician LMF Apertium Dictionary

This is the LMF version of the Galician Apertium dictionary. Monolingual dictionaries for Spanish, Catalan, Galician and Euskera have been generated from the Apertium expanded...

French-Spanish LMF Apertium Bilingual dictionary

- This is the LMF version of the Apertium bilingual dictionary for French and Spanish languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For...

Spanish LMF Parole/Simple Lexicon

This is the LMF version of the Spanish Parole-Simple lexicon. The original PAROLE lexica (20,000 entries per language) were built conform to a model based on EAGLES guidelines...

LMF version of the SenSem Spanish Data Base

This is the LMF version of the SenSem database created by the Spanish Inter-University Research Group GRIAL. As part of SenSem project, a corpus of sentences annotated at the...

Corpus92 Corpus

- The corpus consists of a number of texts corresponding to Access to University examinations held on June 1992 in several Spanish universities. It contains about 350,000 words...

English-Galician CLUVI Dictionary

This is the LMF version of the English-Galician CLUVI Dictionary developed under the direction of Xavier Gómez Guinovart (2005-2012) from parallel texts in the CLUVI Corpus of...

IULA Spanish-English Technical Corpus

The corpus consists of a number of specialized texts (Law, Economics, Medicine, Environment and Computer Science domains) available in both Spanish and English languages. This...

CLUVI Parallel Corpus

- The CLUVI Corpus of the University of Vigo is an open collection of parallel text corpora developed under the direction of Xavier Gómez Guinovart (2003-2012) that covers...

GrAF version of Catalan portions of Wikipedia Corpus

This is the stand-off GrAF version of Catalan portions of the Wikipedia (based on a 2006 dump). This Wikipedia Catalan Corpus contains 122052 articles that contain about 47,3...

IULA Penn Treebank

This treebank consists of a number of Spanish and English sentences that has been manually annotated with syntactical information. The sentences have been choosed from the Penn...

IULA Spanish LSP Treebank

- This treebank consists of a number of sentences syntactically analyzed. The sentences have been choosed from the IULA LSP corpus, automatically annotated with POS information...

GrAF version of Spanish portions of Wikipedia Corpus

This is the stand-off GrAF version of Spanish portions of the Wikipedia (based on a 2006 dump). This Wikipedia Spanish Corpus contains 257019 articles that contain about 150,1...

MATE Parser module for Spanish

In this package we include the following: logonFinal20130315_4matetools361.model; parse_ESCAsentences_mate.sh; freeling_spaMate.sh; toconll2006.py; prueba.txt (test file: 4...

PANACEA Environment Multi Word Italian Lexicon

The Environment MW Italian Lexicon is a lexicon of noun-noun multiword expressions automatically /nextracted from a 36Mio word web crawled corpus in the environmental domain....

PANACEA Italian V-SUBCAT Repubblica lexicon (language independent extractor)

This is a lexicon of verb subcategorisation frames automatically extracted from a 300Mio words newspaper corpus using a language independent SCF acquisition software. The...

PANACEA Italian V-SUBCAT Repubblica lexicon (language dependent extractor)

- The OpenDomain SCF Italian Lexicon is a lexicon of verb subcategorisation frames automatically extracted from a 300Mio words newspaper corpus using a language dependent SCF...

PANACEA Labour Multi Word Italian Lexicon

The Labour MW Italian Lexicon is a lexicon of noun-noun multiword expressions automatically /nextracted from a 70Mio word web crawled corpus in the labour law domain. The...

PANACEA Spanish multi-level, multi-domain lexicon

- This is a multi-level, multi-domain lexicon for Spanish. It combines the automatically acquired lexica for ENV and LAB domains using PANACEA platform and some general domain...

PANACEA Labour SCF MWE merged Italian Lexicon

The Italian PANACEA_LAB_SCF_MWE_merged.lmf.xml lexicon is obtained by merging two automatically extracted lexicons: a domain lexicon (labour) for SCFs,...

PANACEA Environment SCF MWE merged Italian Lexicon

- The Italian PANACEA_ENV_MWE_SCF_merged.lmf.xml lexicon is obtained by merging two automatically extracted lexicons: a domain lexicon (environment) for SCFs,...

118 datasets found