Post-edited and error annotated machine translation corpus PErr 1.0

PID

The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main advantage of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not.

Identifier
PID http://hdl.handle.net/11356/1065
Related Identifier http://www.lrec-conf.org/proceedings/lrec2016/summaries/405.html
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1065
Provenance
Creator Popović, Maja; Arčan, Mihael
Publisher Insight Centre for Data Analytics, National University of Ireland, Galway
Publication Year 2016
Funding Reference info:eu-repo/grantAgreement/EC/H2020/644333
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene; Serbian; German; Spanish; Castilian; English
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 1
Discipline Linguistics