Dataset - B2FIND

DigiDiaDem Speech-Cognitive Dataset (DSCD-CZ-2)

An updated and expanded version of the dataset was created to investigate the speech and cognitive performance of people with varying degrees of cognitive impairment, primarily...

The YouTube Corpus of Singapore English Podcasts

The YouTube Corpus of Singapore English Podcasts (YCSEP) contains transcripts from 620 hours of over 1,300 podcast episodes by Singapore-based content creators. The dataset,...

ASR database ARTUR 0.1 (transcriptions)

ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840...

Parliamentary spoken corpus of Serbian ParlaSpeech-RS 1.0

The ParlaSpeech-RS dataset is built from the transcripts of parliamentary proceedings available in the Serbian part of the ParlaMint (ParlaMint-RS) corpus, and the parliamentary...

ASR training dataset for Serbian JuzneVesti-SR v1.0

The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta'...

ASR training dataset for Croatian ParlaSpeech-HR v1.0

The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of the ParlaMint corpus and the parliamentary recordings available from the...

ASR database ARTUR 0.1 (audio)

ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840...

ASR database ARTUR 1.0 (audio)

Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are...

Parliamentary spoken corpus of Czech ParlaSpeech-CZ 1.0

The ParlaSpeech-CZ dataset is built from the transcripts of parliamentary proceedings available in the Czech part of the ParlaMint corpus, and the parliamentary recordings...

ASR database ARTUR 1.0 (transcriptions)

Artur 1.0 is a speech database designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of...

Face-domain-specific automatic speech recognition models

This entry contains all the files required to implement face-domain-specific automatic speech recognition (ASR) applications using the Kaldi ASR toolkit...

Parliamentary spoken corpus of Polish ParlaSpeech-PL 1.0

The ParlaSpeech-PL dataset is built from the transcripts of parliamentary proceedings available in the Polish part of the ParlaMint corpus, and the parliamentary recordings...

Parliamentary spoken corpus of Croatian ParlaSpeech-HR 2.0

The ParlaSpeech-HR dataset is built from the transcripts of parliamentary proceedings available in the Croatian part of the ParlaMint corpus, and the parliamentary recordings...

Lithuanian speech-to-text Transcriber

Speech to text automatic transcriber for Lithuanian is a containerized application implemented into 17 containers. It covers four areas: administrative, legal, medical and...

14 datasets found