ParCorFull: A Parallel Corpus Annotated with Full Coreference

PID

ParCorFull is a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual natural language processing (NLP) technologies face -- translation of coreference across languages. Our corpus contains parallel texts for the language pair English-German, two major European languages. Despite being typologically very close, these languages still have systemic differences in the realisation of coreference, and thus pose problems for multilingual coreference resolution and machine translation. Our parallel corpus covers the genres of planned speech (public lectures) and newswire. It is richly annotated for coreference in both languages, including annotation of both nominal coreference and reference to antecedents expressed as clauses, sentences and verb phrases. This resource supports research in the areas of natural language processing, contrastive linguistics and translation studies on the mechanisms involved in coreference translation in order to develop a better understanding of the phenomenon.

Identifier
PID http://hdl.handle.net/11372/LRT-2614
Related Identifier http://www.lrec-conf.org/proceedings/lrec2018/summaries/941.html
Related Identifier https://github.com/chardmeier/parcor-full
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-2614
Provenance
Creator Lapshinova-Koltunski, Ekaterina; Hardmeier, Christian; Krielke, Pauline
Publisher Universität des Saarlandes; Uppsala University
Publication Year 2018
Rights Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0); http://creativecommons.org/licenses/by-nc-nd/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language English; German
Resource Type corpus
Format text/plain; charset=utf-8; application/pdf; application/x-gzip; text/plain; downloadable_files_count: 4
Discipline Linguistics