MT@BZ translation corpus v1.0

PID

The MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More precisely, it consists of 26 decrees in German and the same 26 in Italian in their official versions, respectively machine translated by the project team into Italian and into German. 10 of them are COVID-19 related decress, while 16 are miscellaneous. Overall, they consist of around 130,000 words. Their machine translation was carried out with a customized version of ModernMT. Later, the corpus was uploaded first into the annotation platform Webanno, then transferred to Inception. Four annotators annotated the translation errors made by the machine according to an ad hoc error taxonomy for quality assessment. Finally, the annotations were curated to create a gold standard corpus.

Identifier
PID http://hdl.handle.net/20.500.12124/60
Related Identifier https://gitlab.inf.unibz.it/commul/mt-bz/data/bundle/-/tags/v1.0
Related Identifier https://events.tuni.fi/uploads/2023/06/11678752-proceedings-eamt2023.pdf
Related Identifier https://www.eurac.edu/it/institutes-centers/istituto-di-linguistica-applicata/projects/mtbz
Metadata Access http://clarin.eurac.edu/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.eurac.edu:20.500.12124/60
Provenance
Creator De Camillis, Flavia; Chiocchetti, Elena; Stemle, Egon W.
Publisher Institute for Applied Linguistics, Eurac Research
Publication Year 2023
Rights Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0); https://creativecommons.org/licenses/by-nc/4.0/; PUB
OpenAccess true
Contact clarin(at)eurac.edu
Representation
Language Italian; German
Resource Type corpus
Format text/html; application/zip; text/plain; charset=utf-8; downloadable_files_count: 5
Discipline Linguistics