Dataset - B2FIND

EPIC-EuroParl-UdS: A GPT-2 and NMT Surprisal-Annotated Corpus for Translation...

EPIC-EuroParl-UdS is a bidirectional document- and sentence-aligned English–German corpus of European Parliament debates (up to mid-July 2018). It includes the official written...

Crowdsourcing Piedmontese to Test LLMs on Non-Standard Orthography

This dataset contains data for testing machine translation and topic classification in Piedmontese. It is based on FLORES+ (NLLB Team et al., 2024) and SIB-200: A Simple,...

X-SRL Dataset and mBERT Word Aligner

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source...

A Gold Standard Word Alignment for English-Swedish

A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2....

A Gold Standard Word Alignment for English-Swedish (2015-10-12)

A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2.

Czech-English Manual Word Alignment

Corpus of manually aligned Czech-English parallel sentences. It comprises 2500 parallel sentences from 7 different sources.

X-SRL Dataset and mBERT Word Aligner

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source...

7 datasets found

EPIC-EuroParl-UdS: A GPT-2 and NMT Surprisal-Annotated Corpus for Translation...

Crowdsourcing Piedmontese to Test LLMs on Non-Standard Orthography

X-SRL Dataset and mBERT Word Aligner

A Gold Standard Word Alignment for English-Swedish

A Gold Standard Word Alignment for English-Swedish (2015-10-12)

Czech-English Manual Word Alignment

X-SRL Dataset and mBERT Word Aligner