Synthetic part of CzEng 2.0

PID

CzEng is a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL). While the full CzEng 2.0 is freely available for non-commercial research purposes from the project website (https://ufal.mff.cuni.cz/czeng), this release contains only the original monolingual parts of news text (csmono 53M and enmono 79M sentences) with automatic (synthetic) translations by CUBBITT.

See the attached README for additional details such as the file format.

Identifier
PID http://hdl.handle.net/11234/1-4774
Related Identifier https://arxiv.org/abs/2007.03006
Related Identifier https://ufal.mff.cuni.cz/czeng
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-4774
Provenance
Creator Popel, Martin
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2020
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); http://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech; English
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/x-gzip; downloadable_files_count: 3
Discipline Linguistics