Dataset - B2FIND

Albanian Spoken Corpus in Kosovo 1.0

This is the third version of a spoken corpus of Albanian in Kosovo. The data of the corpus is based on short life stories of 212 informants out of sample of 1800 speakers...

ASR database ARTUR 0.1 (transcriptions)

ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840...

ASR database ARTUR 0.1 (audio)

ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840...

ASR database ARTUR 1.0 (audio)

Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are...

ASR database ARTUR 1.0 (transcriptions)

Artur 1.0 is a speech database designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of...

List of formulaic sequences in spoken Slovenian

This document contains 2,374 formulaic sequences in spoken Slovenian, i.e. frequently recurring strings of two to five words, manually annotated for syntactic structure,...

Corpus of metaphorical expressions in spoken Slovene language G-KOMET 1.0

G-KOMET (a corpus of metaphorical expressions in spoken Slovene language) is an upgrade of the hand-annotated written corpus for metaphorical expressions KOMET...

A Digital Dictionary of Tunis Arabic - TUNICO (ELEXIS)

A corpus-based dictionary, enriched with historical data. The dictionary was not only built on data from the corpus of spoken language that was compiled in the same project, but...

TED-ELH Parallel Corpus

The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data.

29 datasets found