-
The "Mići Princ" text and speech dataset of Chakavian micro-dialects
The Mići Princ "text and speech" dialectal dataset is a word-aligned version of the translation of The Little Prince into various Chakavian micro-dialects, released by the... -
Spoken corpus Gos VideoLectures 2.0 (audio)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Spoken corpus Gos VideoLectures 3.0 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Spoken corpus Gos VideoLectures 1.0 (transcription)
Gos Videolectures is an add-on to the Gos reference speech corpus of Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos Videolectures... -
Speech Database of Spoken Flight Information Enquiries SOFES 1.0
The SOFES speech database (Spoken Flight Enquiries in Slovene) is a collection of transcribed and segmented audio recordings of spoken flight-information enquiries in Slovene.... -
Spoken corpus Gos VideoLectures 3.0 (audio)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Spoken corpus Gos VideoLectures 1.0 (audio)
Gos VideoLectures is an add-on to the Gos reference speech corpus of Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos Videolectures... -
SNABI database for continuous speech recognition 1.2
The SNABI speech database can be used to train continuous speech recognition for Slovene language. The database comprises 1530 sentences, 150 words and the alphabet. 132... -
ASR database ARTUR 0.1 (transcriptions)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
Parliamentary spoken corpus of Serbian ParlaSpeech-RS 1.0
The ParlaSpeech-RS dataset is built from the transcripts of parliamentary proceedings available in the Serbian part of the ParlaMint (ParlaMint-RS) corpus, and the parliamentary... -
ASR training dataset for Serbian JuzneVesti-SR v1.0
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta'... -
ASR training dataset for Croatian ParlaSpeech-HR v1.0
The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of the ParlaMint corpus and the parliamentary recordings available from the... -
Spoken corpus Gos VideoLectures 4.1 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training... -
ASR database ARTUR 0.1 (audio)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
ASR database ARTUR 1.0 (audio)
Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are... -
Parliamentary spoken corpus of Czech ParlaSpeech-CZ 1.0
The ParlaSpeech-CZ dataset is built from the transcripts of parliamentary proceedings available in the Czech part of the ParlaMint corpus, and the parliamentary recordings... -
Spoken corpus Gos VideoLectures 4.0 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Spoken corpus Gos VideoLectures 4.2 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training... -
ASR database ARTUR 1.0 (transcriptions)
Artur 1.0 is a speech database designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of... -
Parliamentary spoken corpus of Polish ParlaSpeech-PL 1.0
The ParlaSpeech-PL dataset is built from the transcripts of parliamentary proceedings available in the Polish part of the ParlaMint corpus, and the parliamentary recordings...