Prague Czech-English Dependency Treebank 2.0 - Russian translation


Prague Czech-English Dependency Treebank - Russian translation (PCEDT-R) is a project of translating a subset of Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) to Russian and linguistically annotating the Russian translations with emphasis on coreference and cross-lingual alignment of coreferential expressions. Cross-lingual comparison of coreference means is currently the purpose that drives development of this corpus.

The current version 0.5 is a preliminary version, which contains (+ denotes new features): * complete PCEDT 2.0 documents "wsj_1900"-"wsj_1949" * Czech-English word alignment of coreferential expressions annotated manually mainly on the t-layer + Russian translations of the original English sentences + automatic tokenization, part-of-speech tagging and morphological analysis for Russian + automatic word alignment between all Czech and Russian words + manual alignment between Russian and the other two languages on possessive pronouns

Metadata Access
Creator Novák, Michal; Nedoluzhko, Anna; Schwarz (Khoroshkina), Anna
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2016
Rights CC-BY-NC-SA + LDC99T42;; RES
OpenAccess true
Contact lindat-help(at)
Language English; Czech; Russian
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics