Prague Czech-English Dependency Treebank 2.0 - Russian translation

PID

Prague Czech-English Dependency Treebank - Russian translation (PCEDT-R) is a project of translating a subset of Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) to Russian and linguistically annotating the Russian translations with emphasis on coreference and cross-lingual alignment of coreferential expressions. Cross-lingual comparison of coreference means is currently the purpose that drives development of this corpus.

The current version 0.5 is a preliminary version, which contains (+ denotes new features): * complete PCEDT 2.0 documents "wsj_1900"-"wsj_1949" * Czech-English word alignment of coreferential expressions annotated manually mainly on the t-layer + Russian translations of the original English sentences + automatic tokenization, part-of-speech tagging and morphological analysis for Russian + automatic word alignment between all Czech and Russian words + manual alignment between Russian and the other two languages on possessive pronouns

Identifier
PID http://hdl.handle.net/11234/1-1791
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1791
Provenance
Creator Novák, Michal; Nedoluzhko, Anna; Schwarz (Khoroshkina), Anna
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2016
Rights CC-BY-NC-SA + LDC99T42; https://lindat.mff.cuni.cz/repository/xmlui/page/license-pcedt2; RES
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language English; Czech; Russian
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics