32 datasets found

ResourceType: corpus Keywords: speech corpus

Filter Results
  • Corpus of precisely articulated Czech speech

    The corpus contains speech data of 2 Czech native speakers, male and female. The speech is very precisely articulated up to hyper-articulated, and the speech rate is low. The...
  • Czech Malach Cross-lingual Speech Retrieval Test Collection

    The package contains Czech recordings of the Visual History Archive which consists of the interviews with the Holocaust survivors. The archive consists of audio recordings, four...
  • A Small Dataset for English-to-Czech Speech Translation in the Travel Domain

    This small dataset contains 3 speech corpora collected using the Alex Translate telephone service (https://ufal.mff.cuni.cz/alex#alex-translate). The "part1" and "part2" corpora...
  • ParCzech 3.0

    The ParCzech 3.0 corpus is the third version of ParCzech consisting of stenographic protocols that record the Chamber of Deputies’ meetings held in the 7th term (2013-2017) and...
  • Vystadial 2013 – Czech data

    Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems....
  • Business English learner speech corpus SAPS

    SAPS is a specialized speech corpus which contains business meeting simulations in English between undergraduate students of Languages for Business and Economics at the School...
  • Clarin-PL Mobile Corpus (EMU)

    Polish speech corpus of read speech recorded over the phone. Contains many speakers, each reading a few dozen different sentences and a list of words with rare phonemes. Useful...
  • EU Parliament Speech corpus

    A collection of 1040 EU parliament speeches with transcription and annotations. Includes original speeches and PL/EN translations.
  • Cyfry

    A small spoken digits corpus in polish. Contains 488 recordings of 25 speakers reading 20 digits (0-9) each. Amounts to around 76 minutes of recordings. Split into train (~72%),...
  • Clarin-PL Studio Corpus (EMU;updated phonetics)

    Polish speech corpus of read speech recorded in a studio. Contains many speakers, each reading a few dozen different sentences and a list of words with rare phonemes. Useful for...
  • Clarin-PL Studio Corpus (EMU)

    Polish speech corpus of read speech recorded in a studio. Contains many speakers, each reading a few dozen different sentences and a list of words with rare phonemes. Useful for...
  • Read Speech Corpus (7G)

    The corpus of read Lithuanian speech „7G“ was compiled in 2015-2016. The corpus consists of 352 audio recordings with a total duration of over 7 hours. Seven different speakers...
You can also access this registry using the API (see API Docs).