Dataset - B2FIND

ECODE: Extractor de Contextos Definitorios

Código fuente de un sistema basado en reglas lingüísticas para la extracción de contextos definitorios sobre textos especializados en español. Este sistema está formado por...

Quantification of chemical components present in phellem and phelloderm + plh...

The dataset contains the chemical composition (water content, fiber, six carbohydrates, total proteins, organic matter, and lignin) of the subcortical tissues (phellem and...

IULA Spanish-English Technical Corpus

The corpus consists of a number of specialized texts (Law, Economics, Medicine, Environment and Computer Science domains) available in both Spanish and English languages. This...

IULA Penn Treebank

This treebank consists of a number of Spanish and English sentences that has been manually annotated with syntactical information. The sentences have been choosed from the Penn...

LMF version of the SenSem Spanish Data Base

This is the LMF version of the SenSem database created by the Spanish Inter-University Research Group GRIAL. As part of SenSem project, a corpus of sentences annotated at the...

MATE Parser module for Spanish

In this package we include the following: logonFinal20130315_4matetools361.model; parse_ESCAsentences_mate.sh; freeling_spaMate.sh; toconll2006.py; prueba.txt (test file: 4...

PANACEA Spanish automatically acquired lexicon for ENV domain: Lexical Semant...

This is a domain-specific lexicon of for Spanish for environment (ENV) domain. This lexicon contains the a set of nouns classified into nine different semantic classes. It has...

PANACEA Environment Bilingual Glossary EL-EN (Greek-English)

This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for EL-EN language pair. The tables are firstly...

PANACEA Environment Corpus n-grams ES (Spanish)

This data set contains Spanish word n-grams and Spanish word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The...

PANACEA English Gold Standard for lexical semantic classification

We present a set of English gold-standards for different noun classes created in PANACEA to train and test automatic classifiers. To create these gold-standards we used we the...

PANACEA Labour Legislation Corpus n-grams EN (English)

This data set contains English word n-grams and English word/tag/lemma n-grams in the "labour Legislation" (LAB) domain. N-grams are accompanied by their observed frequency...

PANACEA Annotated Dependency Greek Labour Legislation Corpus Version 2

PANACEA Annotated Greek Labour Legislation Corpus Version 2 consists of Greek texts in the Labour Legislation (LAB) domain that were collected and automatically annotated in the...

PANACEA Environment Corpus n-grams IT (Italian)

This data set contains Italian word n-grams and Italian word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The...

PANACEA Spanish automatically acquired lexicon for ENV domain: Subcategorizat...

This is a domain-specific lexicon for Spanish for environment (ENV) domain. This lexicon contain both, subcategorization frames for verbs and lexical semantic classes for nouns....

PANACEA Environment Bilingual Glossary FR-EN (French-English)

This folder contains files for bilingual glossary creation from factored phrase tables that include part of speech tagged text for FR-EN language pair. The tables are firstly...

PANACEA Labour and Repubblica merged Italian Lexicon

The Italian PANACEA_rep_lab_merged.lmf.xml is SCF lexicon obtained by merging two automatically extracted lexicons: a domain lexicon (labour) PANACEA_SCF_IT_labour.lmf.xml and a...

PANACEA Italian V-SUBCAT gold-standard for LAB domain

The PANACEA_SCF_Gold_LAB_IT is a manually created "gold-standard" lexicon of verbal subcategorisation frames for 27 verb lemmas. The language is Italian and the domain is Labour...

PANACEA Italian Parole V-SUBCAT Gold Standard lexicon

The PAROLE-SCF-31-IT is a lexicon of verb subcategorisation frames for 31 verb lemmas extracted from the PAROLE Italian Lexicon (Ruimy et a. 2003).

PANACEA Labour and Parole merged Italian Lexicon

The Italian PAROLE_lab_merged.lmf.xml is SCF lexicon obtained by merging two automatically extracted lexicons: a domain lexicon (labour) pANACEA_SCF_IT_labour.lmf.xml and a the...

PANACEA English V-SUBCAT gold-standard for LAB domain

This is a domain-specific gold-standard for English subcategorization frames, in the case, for labour (LAB) domain. This gold-standard was manually developed, choosing a set of...

540 datasets found