Dataset - B2FIND

Spoken corpora of parliamentary debates ParlaSpeech 3.0

The ParlaSpeech corpora are built from the transcripts of parliamentary proceedings of Croatian, Serbian, Polish, and Czech parliaments available in the ParlaMint 4.0 corpus...

Dataset for primary stress identification in Croatian and related languages a...

The dataset contains recordings and offset annotations of a sample of the Croaitan parliamentary recordings from the corpus ParlaSpeech-HR. It contains training and testing data...

The "Mići Princ" text and speech dataset of Chakavian micro-dialects

The Mići Princ "text and speech" dialectal dataset is a word-aligned version of the translation of The Little Prince into various Chakavian micro-dialects, released by the...

Parliamentary spoken corpus of Serbian ParlaSpeech-RS 1.0

The ParlaSpeech-RS dataset is built from the transcripts of parliamentary proceedings available in the Serbian part of the ParlaMint (ParlaMint-RS) corpus, and the parliamentary...

ASR training dataset for Serbian JuzneVesti-SR v1.0

The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta'...

ASR training dataset for Croatian ParlaSpeech-HR v1.0

The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of the ParlaMint corpus and the parliamentary recordings available from the...

Parliamentary spoken corpus of Croatian ParlaSpeech-HR 2.0

The ParlaSpeech-HR dataset is built from the transcripts of parliamentary proceedings available in the Croatian part of the ParlaMint corpus, and the parliamentary recordings...

7 datasets found