Post-edited and error annotated machine translation corpus PErr 1.0

Dataset

PID

The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main advantage of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not.

Identifier
PID	http://hdl.handle.net/11356/1065
Related Identifier	http://www.lrec-conf.org/proceedings/lrec2016/summaries/405.html
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1065

Provenance
Creator	Popović, Maja; Arčan, Mihael
Publisher	Insight Centre for Data Analytics, National University of Ireland, Galway
Publication Year	2016
Funding Reference	info:eu-repo/grantAgreement/EC/H2020/644333
Rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene; Serbian; German; Spanish; Castilian; English
Resource Type	corpus
Format	text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 1
Discipline	Linguistics