Czech-English Parallel Corpus 1.0 (CzEng 1.0)

PID

CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL) freely available for non-commercial research purposes.

CzEng 1.0 contains 15 million parallel sentences (233 million English and 206 million Czech tokens) from seven different types of sources automatically annotated at surface and deep (a- and t-) layers of syntactic representation.

Identifier
PID http://hdl.handle.net/11234/1-1458
Related Identifier http://hdl.handle.net/11858/00-097C-0000-0001-4916-9
Related Identifier http://ufal.mff.cuni.cz/czeng
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1458
Provenance
Creator Bojar, Ondřej; Žabokrtský, Zdeněk; Dušek, Ondřej; Galuščáková, Petra; Majliš, Martin; Mareček, David; Maršík, Jiří; Novák, Michal; Popel, Martin; Tamchyna, Aleš
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2011
Funding Reference info:eu-repo/grantAgreement/EC/FP7/231720; info:eu-repo/grantAgreement/EC/FP7/247762
Rights Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0); http://creativecommons.org/licenses/by-nc-sa/3.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech; English
Resource Type corpus
Format application/x-tar; text/plain; charset=utf-8; downloadable_files_count: 6
Discipline Linguistics