A Small Dataset for English-to-Czech Speech Translation in the Travel Domain

PID

This small dataset contains 3 speech corpora collected using the Alex Translate telephone service (https://ufal.mff.cuni.cz/alex#alex-translate). The "part1" and "part2" corpora contain English speech with transcriptions and Czech translations. These recordings were collected from users of the service. Part 1 contains earlier recordings, filtered to include only clean speech; Part 2 contains later recordings with no filtering applied. The "cstest" corpus contains recordings of artificially created sentences, each containing one or more Czech names of places in the Czech Republic. These were recorded by a multinational group of students studying in Prague.

Identifier
PID http://hdl.handle.net/11234/1-1735
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1735
Provenance
Creator Cífka, Ondřej; Bojar, Ondřej
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2016
Funding Reference info:eu-repo/grantAgreement/EC/H2020/645452
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); http://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language English; Czech
Resource Type corpus
Format application/octet-stream; application/zip; text/plain; charset=utf-8; downloadable_files_count: 4
Discipline Linguistics