Dataset - B2FIND

Training corpus of spoken Slovenian ROG 1.1

Training corpus of spoken Slovenian ROG 1.1 is an improved version of the ROG 1.0 corpus (http://hdl.handle.net/11356/1992). The main differences between the original and the...

Training corpus of spoken Slovenian ROG 1.0

Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such...

Corpus of spoken Slovenian ROG-Dialog 1.0

Corpus of spoken Slovenian ROG-Dialog consists of volunteered audio, recorded by students by asking their relatives or acquaintances to talk on record in their homes. The...

Corpus of conversational humor Krohot 1.0

The KROHOT corpus consists of 10 audio recordings of private, spontaneous conversations between two or three speakers, with a total duration of 232 minutes. Most recordings were...

Spoken corpora of parliamentary debates ParlaSpeech 3.0

The ParlaSpeech corpora are built from the transcripts of parliamentary proceedings of Croatian, Serbian, Polish, and Czech parliaments available in the ParlaMint 4.0 corpus...

ASR model evaluator

Docker image with ASR evaluation tool that has support for WER calculation on punctuated and capitalised transcripts. The UI allows uploading the reference and predicted...

Business English learner speech corpus SAPS

SAPS is a specialized speech corpus which contains business meeting simulations in English between undergraduate students of Languages for Business and Economics at the School...

Spoken corpus Gos 2.1 (transcriptions)

The spoken corpus Gos 2.1 is the reference speech corpus of the Slovenian language. This second edition contains about 300 hours of speech, or 2.4 million words, 127 thousand...

The "Mići Princ" text and speech dataset of Chakavian micro-dialects

The Mići Princ "text and speech" dialectal dataset is a word-aligned version of the translation of The Little Prince into various Chakavian micro-dialects, released by the...

Spoken corpus Gos VideoLectures 3.0 (transcription)

Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus...

Spoken corpus Gos VideoLectures 1.0 (transcription)

Gos Videolectures is an add-on to the Gos reference speech corpus of Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos Videolectures...

Spoken corpus Gos 2.0 (transcriptions)

The spoken corpus Gos 2.0 is the reference speech corpus of the Slovenian language. This second edition contains about 300 hours of speech, or 2.4 million words, 127 thousand...

Speech Database of Spoken Flight Information Enquiries SOFES 1.0

The SOFES speech database (Spoken Flight Enquiries in Slovene) is a collection of transcribed and segmented audio recordings of spoken flight-information enquiries in Slovene....

Parliamentary spoken corpus of Serbian ParlaSpeech-RS 1.0

The ParlaSpeech-RS dataset is built from the transcripts of parliamentary proceedings available in the Serbian part of the ParlaMint (ParlaMint-RS) corpus, and the parliamentary...

ASR training dataset for Serbian JuzneVesti-SR v1.0

The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta'...

ASR training dataset for Croatian ParlaSpeech-HR v1.0

The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of the ParlaMint corpus and the parliamentary recordings available from the...

Spoken corpus Gos VideoLectures 4.1 (transcription)

Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training...

Parliamentary spoken corpus of Czech ParlaSpeech-CZ 1.0

The ParlaSpeech-CZ dataset is built from the transcripts of parliamentary proceedings available in the Czech part of the ParlaMint corpus, and the parliamentary recordings...

Spoken corpus Gos VideoLectures 4.0 (transcription)

Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus...

Spoken corpus Gos VideoLectures 4.2 (transcription)

Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training...

25 datasets found