Dataset - B2FIND

Lexicon of Lithuanian Basketball Slang Terms

The lexicon is compiled applying the method of crowdsourcing using the dictionary-editing system LEXONOMY. It was compiled as a study project by the group of students in the...

Corpus of spoken Slovenian ROG-Dialog 1.0

Corpus of spoken Slovenian ROG-Dialog consists of volunteered audio, recorded by students by asking their relatives or acquaintances to talk on record in their homes. The...

The "Mobile languages" corpus MoJezik 1.0 (audio)

The "Mobile Languages" corpus documents in-depth, semi-structured sociolinguistic interviews with speakers from two Slovene regions and distinctive dialects: Idrija (Cerkno...

The "Mobile languages" corpus MoJezik 1.0 (transcription)

The "Mobile Languages" corpus documents in-depth, semi-structured sociolinguistic interviews with speakers from two Slovene regions and distinctive dialects: Idrija (Cerkno...

Languages in Migration

LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness...

Prague Dependency Treebank of Spoken Language (PDTSL) 0.5

The first edition of a speech corpus with a speech reconstruction layer (edited transcript). The project of speech reconstruction of Czech and English has been started at UFAL...

ORTOFON v3: corpus of informal spoken Czech with multi-tier transcription (tr...

ORTOFON v3 is a corpus of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) that covers the area of the whole Czech...

Large-Scale Colloquial Persian 0.5

"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a...

Bavaria's Dialects Online

Bavaria's Dialects Online (BDO) is the digital language information system of the three projects "Bavarian Dictionary", "Franconian Dictionary", and "Dialectological Information...

ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...

ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

ORAL2013: balanced corpus of informal spoken Czech (transcriptions & audio)

ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...

ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

ORTOFON v3: corpus of informal spoken Czech with multi-tier transcription (tr...

ORTOFON v3 is a corpus of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) that covers the area of the whole Czech...

ORAL2013: balanced corpus of informal spoken Czech (transcriptions)

ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

Das Kiezdeutschkorpus (KiDKo)

A multi-modal digital corpus of spontaneous discourse data from informal, oral peer group in multi- and monoethnic speech communities. Multimodales, digitales Korpus...

Das Kiezdeutschkorpus "KiDKo": Zusatzkorpora

Aditional corpus I "Frog Story" oral presentation of the picture story (Mayer 1969), written reproduction of the "Frog Story" from memory. Additional corpus...

EKKD115: Eesti mitmekeelse keelekeskkonna andmestik

Siin repositooriumis on projekti "Eesti mitmekeelse keelekeskkonna andmestik" raames kogutud tekstid ja link keelemaastike pildikaardile. 1) Eesti-inglise kakskeelsete...

Suuline eesti keel arvudes. Sagedusandmestikud

Siin repositooriumis on projekti "Suuline eesti keel arvudes" raames koostatud sagedusandmestikud, mis kirjeldavad suulist eesti keelt. Andmestikud põhinevad Eesti keele...

The "Mići Princ" text and speech dataset of Chakavian micro-dialects

The Mići Princ "text and speech" dialectal dataset is a word-aligned version of the translation of The Little Prince into various Chakavian micro-dialects, released by the...

Albanian Spoken Corpus in Kosovo 0.2

This is the second version of a spoken corpus of Albanian in Kosovo. The data of the corpus is based on short life stories of 212 informants out of sample of 1800 speakers...

29 datasets found