Dataset - B2FIND

Large Language Models for Research Data Management?! 2025 (LLMs4RDM 2025)

Research data management (RDM) has become an important discipline that enables researchers to effectively organise, preserve and share their research results. RDM is a new...

Vystadial 2013 – scripts

Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems....

Large Corpus of Czech Parliament Plenary Hearings

We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximately 444 hours of speech data and corresponding text transcriptions. The whole...

A Speech Test Set of Practice Business Presentations with Additional Relevant...

We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for...

A Small Dataset for English-to-Czech Speech Translation in the Travel Domain

This small dataset contains 3 speech corpora collected using the Alex Translate telephone service (https://ufal.mff.cuni.cz/alex#alex-translate). The "part1" and "part2" corpora...

Texture evolution of an AA7075_O alloy during cold radial forging

Radial forging is an open-die forging process utilizing radially moving dies for producing of cylindrical/tubular components with different internal and external profiles. The...

DiaBiz ASR benchmark

An evaluation report with accompanying datasets benchmarking the performance of commercially available ASR services of Polish on the DiaBiz corpus.

7 datasets found