Dataset - B2FIND

INEL Evenki Corpus

Corpus Citation Däbritz, Chris Lasse & Gusev, Valentin. 2021. INEL Evenki Corpus. Version 1.0. Publication date 2021-12-31. Archived at Universität Hamburg....

Replication Data for: Russian verbal borrowings in Udmurt

This is the dataset used in a study of Russian verbal loans in Udmurt. The files contain lists of Russian verbs found in the Udmurt social media corpus...

Map task corpus of heritage BCMS 1.0

The Map task corpus of heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS) consists of elicited conversations (map tasks) by 29 second-generation BCMS speakers originating from...

Posts of German PC Games Online Forum

Contains linguistic annotated data from the Online-Forum PC Games (https://forum.pcgames.de). The forum is concerned about gaming. All posts (approx. 2.4 mio) where scraped in...

VinKo (Varieties in Contact) Corpus v1.1

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

VinKo (Varieties in Contact) Corpus v1.0

VINKO is a spoken corpus based on crowdsourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

AThEME Verona-Trento Corpus

The AThEME Verona-Trento Corpus is a spoken corpus composed of data collected during the AThEME project in Work Package 2 ‘Regional Languages’ by the units of Verona and Trento...

Türkisch-Englisch-Deutsch bei Herkunftssprechern (TEDH)

The TEDH has been created as part of the project "Foreign Language Acquisition in German-Turkish bilinguals". The TEDH Corpus contains interviews in three languages:...

Hamburg Corpus of Argentinean Spanish (HaCASpa)

Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two...

Catalan in a bilingual context (PhonCAT)

Audio recordings of prompted, read and spontaneous speech data from L1 Catalan speakers from Barcelona. The data is stratified according to three different city districts and...

10 datasets found