Dataset - B2FIND

INEL Enets Corpus

Corpus Citation Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta. 2024. INEL Enets Corpus. Version 1.0. Publication date 2024-11-30....

Replication Data for: Russian verbal borrowings in Udmurt

This is the dataset used in a study of Russian verbal loans in Udmurt. The files contain lists of Russian verbs found in the Udmurt social media corpus...

A corpus of Slavic dialects in Albania

A corpus of Slavic dialects in Albania The user-friendly version of the Corpus with search options is available here. These are the main parameters of the corpus:...

Final report—Public part. Contact-induced language change in situations of no...

This is a final report for a DFG-supported project. This project aimed to model language contact outcomes using the methods of statistical language research, social...

INEL Kalmyk Corpus

Corpus citation Baranova, Vlada. 2025. INEL Kalmyk Corpus. Archived at Universität Hamburg. Version 1.0. Publication date...

Posts of German PC Games Online Forum

Contains linguistic annotated data from the Online-Forum PC Games (https://forum.pcgames.de). The forum is concerned about gaming. All posts (approx. 2.4 mio) where scraped in...

Catalan in a bilingual context (PhonCAT)

Audio recordings of prompted, read and spontaneous speech data from L1 Catalan speakers from Barcelona. The data is stratified according to three different city districts and...

Hamburg Corpus of Argentinean Spanish (HaCASpa)

Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two...

Türkisch-Englisch-Deutsch bei Herkunftssprechern (TEDH)

The TEDH has been created as part of the project "Foreign Language Acquisition in German-Turkish bilinguals". The TEDH Corpus contains interviews in three languages:...

INEL Evenki Corpus

Corpus Citation Däbritz, Chris Lasse & Gusev, Valentin. 2021. INEL Evenki Corpus. Version 1.0. Publication date 2021-12-31. Archived at Universität Hamburg....

INEL Nenets Corpus

Corpus Citation Budzisch, Josefina; Wagner-Nagy, Beáta. 2024. INEL Nenets Corpus. Version 1.0. Publication date 2024-12-31....

VinKo (Varieties in Contact) Corpus v1.1

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

VinKo (Varieties in Contact) Corpus v1.0

VINKO is a spoken corpus based on crowdsourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

AThEME Verona-Trento Corpus

The AThEME Verona-Trento Corpus is a spoken corpus composed of data collected during the AThEME project in Work Package 2 ‘Regional Languages’ by the units of Verona and Trento...

KONTATTO v1.0

Kontatto is a corpus of transcribed and annotated spoken data collected by Silvia Dal Negro at the Free University of Bozen/Bolzano. It consists of almost 150,000 orthographic...

VinKo (Varieties in Contact) Corpus v1.2

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

Map task corpus of heritage BCMS 1.0

The Map task corpus of heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS) consists of elicited conversations (map tasks) by 29 second-generation BCMS speakers originating from...

INEL Nganasan Corpus

Corpus Citation Brykina, Maria; Gusev, Valentin; Szeverényi, Sándor; Wagner-Nagy, Beáta. INEL Nganasan Corpus. Version 1.0. Publication date 2025-05-02....

INEL Tavda Mansi Corpus

Corpus Citation Sipőcz, Katalin & Wagner-Nagy, Beáta. 2025. INEL Tavda Mansi Corpus. Version 1.0. Publication date 2025-05-15....

HELLO CAMPANIA! Philippines Collection

The Philippines collection contains data for 66 speakers: 32 first generation (G1), 28 second generation (G2), 6 homeland (G0). The collection contains three folders for each...

21 datasets found