-
The YouTube Corpus of Singapore English Podcasts
The YouTube Corpus of Singapore English Podcasts (YCSEP) contains transcripts from 620 hours of over 1,300 podcast episodes by Singapore-based content creators. The dataset,... -
ASR database ARTUR 0.1 (transcriptions)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
Parliamentary spoken corpus of Serbian ParlaSpeech-RS 1.0
The ParlaSpeech-RS dataset is built from the transcripts of parliamentary proceedings available in the Serbian part of the ParlaMint (ParlaMint-RS) corpus, and the parliamentary... -
ASR training dataset for Serbian JuzneVesti-SR v1.0
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta'... -
ASR training dataset for Croatian ParlaSpeech-HR v1.0
The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of the ParlaMint corpus and the parliamentary recordings available from the... -
ASR database ARTUR 0.1 (audio)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
ASR database ARTUR 1.0 (audio)
Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are... -
Parliamentary spoken corpus of Czech ParlaSpeech-CZ 1.0
The ParlaSpeech-CZ dataset is built from the transcripts of parliamentary proceedings available in the Czech part of the ParlaMint corpus, and the parliamentary recordings... -
ASR database ARTUR 1.0 (transcriptions)
Artur 1.0 is a speech database designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of... -
Face-domain-specific automatic speech recognition models
This entry contains all the files required to implement face-domain-specific automatic speech recognition (ASR) applications using the Kaldi ASR toolkit... -
Parliamentary spoken corpus of Polish ParlaSpeech-PL 1.0
The ParlaSpeech-PL dataset is built from the transcripts of parliamentary proceedings available in the Polish part of the ParlaMint corpus, and the parliamentary recordings... -
Parliamentary spoken corpus of Croatian ParlaSpeech-HR 2.0
The ParlaSpeech-HR dataset is built from the transcripts of parliamentary proceedings available in the Croatian part of the ParlaMint corpus, and the parliamentary recordings... -
Lithuanian speech-to-text Transcriber
Speech to text automatic transcriber for Lithuanian is a containerized application implemented into 17 containers. It covers four areas: administrative, legal, medical and...
