Dataset - B2FIND

LegISTyr test set

LegISTyr is a machine translation test set for evaluating the quality of legal terminology translation from Italian to South Tyrolean German, a minor standard variety of German....

VinKo (Varieties in Contact) Corpus v1.1

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

VinKo (Varieties in Contact) Corpus v1.0

VINKO is a spoken corpus based on crowdsourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

MT@BZ annotation guidelines v1.0

The MT@BZ annotation guidelines are guidelines for legal Italian-German machine translation quality assessment. Particularly, they cover the South Tyrolean German variety. They...

MT@BZ translation corpus v1.0

The MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More...

AThEME Verona-Trento Corpus

The AThEME Verona-Trento Corpus is a spoken corpus composed of data collected during the AThEME project in Work Package 2 ‘Regional Languages’ by the units of Verona and Trento...

Kolipsi-1 Corpus v1.0

The Kolipsi-1 L2 is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI project...

LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English 1.1

LEONIDE is a longitudinal corpus of student essays documenting the language competences and writing development of lower secondary school students in three different languages....

MERLIN Written Learner Corpus for Czech, German, Italian 1.1

The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR)...

LEKO v1.0

The LEKO corpora LEKO_Kolipsi and LEKO_Merlin provide lexical annotations for phraseological elements in Italian L2 writing on the basis of a subset of the texts of the...

KONTATTO v1.0

Kontatto is a corpus of transcribed and annotated spoken data collected by Silvia Dal Negro at the Free University of Bozen/Bolzano. It consists of almost 150,000 orthographic...

DIDI - The DiDi Corpus of South Tyrolean CMC 1.0.0

The DiDi corpus has an overall size of around 600.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook...

VinKo (Varieties in Contact) Corpus v1.2

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

Kolipsi-2 Corpus v1.1

The Kolipsi-2 Corpus is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI II...

Kolipsi-1 Corpus v1.1

The Kolipsi-1 L2 is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI project...

e-LIS: Electronic Bilingual Dictionary Italian Sign Language (LIS) – Italian ...

Legacy files of the former Electronic Bilingual Dictionary Italian Sign Language (LIS) - Italian, the first prototype of an online Italian Sign Language reference dictionary...

Kolipsi-2 Corpus v1.0

The Kolipsi-2 Corpus is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI II...

MERLIN Written Learner Corpus for Czech, German, Italian 1.0

The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR)...

18 datasets found