Dataset - B2FIND

LegISTyr test set

LegISTyr is a machine translation test set for evaluating the quality of legal terminology translation from Italian to South Tyrolean German, a minor standard variety of German....

Evaluating MT for LIWC

Data belonging to article 'Machine-translated texts as an alternative to translated dictionaries for LIWC'. The data are the result of computations described in the article....

FAUST cs-en 0.5

This machine translation test set contains 2223 Czech sentences collected within the FAUST project (https://ufal.mff.cuni.cz/grants/faust, http://hdl.handle.net/11234/1-3308)....

Hausa Visual Genome 1.0

Data Hausa Visual Genome 1.0, a multimodal dataset consisting of text and images suitable for English-to-Hausa multimodal machine translation tasks and multimodal research. We...

Many Czech References for 50 Sentences Selected from WMT11 Data

This dataset contains the whole set of very many Czech translations for 50 English source sentences coming from WMT11 test set (http://www.statmt.org/wmt11). In total, there are...

YouTube-ASL Clip Keypoint Dataset

The YouTube-ASL Clip Keypoint Dataset is a curated collection of sentence-level American Sign Language (ASL) keypoint sequences derived from publicly available YouTube videos....

WMT16 Tuning Shared Task Models (English-to-Czech)

This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training...

Debiasing Algorithm through Model Adaptation

Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey...

QT21 Data

Post-editing and MQM annotations produced by the QT21 project. As described in @InProceedings{specia-etal_MTSummit:2017, author = {Specia, Lucia and Kim Harris and...

Test Data EN-DE APE Shared Task WMT17

Test data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in 2,000 English-German pairs (source and...

Manually Classified Errors in Cs->Sk Translation

Manual classification of errors of Czech-Slovak translation according to the classification introduced by Vilar et al. [1]. First 50 sentences from WMT 2010 test set were...

Cesilko Web Service for Weblicht

Weblicht integration of Cesilko (http://hdl.handle.net/11858/00-097C-0000-0006-AAFE-A)

WMT18 Quality Estimation Shared Task Training and Development Data

Training and development data for the WMT18 QE task. Test data will be published as a separate item. This shared task will build on its previous six editions to further examine...

WMT16 Quality Estimation Shared Task Training and Development Data

Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine...

Khresmoi Query Translation Test Data 2.0

This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans...

Test Data EN-DE MT_PBSMT APE Shared Task WMT18

Test data for the WMT 2018 Automatic post-editing task. They consist in English-German pairs (source and target) belonging to the information technology domain and already...

Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

This dataset contains automatic paraphrases of Czech official reference translations for the Workshop on Statistical Machine Translation shared task. The data covers the years...

MTMonkey

MTMonkey is a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre-...

The Use of Machine Translation by Ukrainian War Refugees in Czechia

Data from a questionnaire survey conducted from 2022-08-25 to 2022-11-15 and exploring the use of machine translation by Ukrainian refugees in the Czech Republic. The presented...

Ptakopět data: the dataset for experiments on outbound translation

The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also...

85 datasets found