-
Bulgarian-English parallel corpus MaCoCu-bg-en 2.0
The Bulgarian-English parallel corpus MaCoCu-bg-en 2.0 was built by crawling the “.bg” and “.бг” internet top-level domains in 2021, extending the crawl dynamically to other... -
Macedonian-English parallel corpus MaCoCu-mk-en 2.0
The Macedonian-English parallel corpus MaCoCu-mk-en 2.0 was built by crawling the “.mk” and “.мкд” internet top-level domains in 2021, extending the crawl dynamically to other... -
Turkish-English parallel corpus MaCoCu-tr-en 2.0
The Turkish-English parallel corpus MaCoCu-tr-en 2.0 was built by crawling the “.tr” and “.cy” internet top-level domains in 2021, extending the crawl dynamically to other... -
Maltese-English parallel corpus MaCoCu-mt-en 2.0
The Maltese-English parallel corpus MaCoCu-mt-en 2.0 was built by crawling the ".mt" internet top-level domain in 2021, extending the crawl dynamically to other domains as well.... -
Slovene-English parallel corpus slenWaC 1.0
The slenWaC corpus version 1.0 consists of parallel Slovene-English texts crawled from the .si top-level domain for Slovenia. The corpus was built with Spidextor... -
Serbian-English parallel corpus srenWaC 1.0
The srenWaC corpus consists of sentence aligned parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor... -
Croatian-English parallel corpus MaCoCu-hr-en 1.0
The Croatian-English parallel corpus MaCoCu-hr-en 1.0 was built by crawling the ".hr" internet top-level domain in 2021, extending the crawl dynamically to other domains as... -
Bilingual Corpus of Underground Mining (ELEXIS)
PodzemniRadovi-sr-en, dvojezični poravnati korpus radova iz oblasti rudarstva. Undeground-mining-sr-en: bilingual texts from the Underground Mining Engineering journal (55... -
Macedonian-English parallel corpus MaCoCu-mk-en 1.0
The Macedonian-English parallel corpus MaCoCu-mk-en 1.0 was built by crawling the ".mk" and ".мкд" internet top-level domains in 2021, extending the crawl dynamically to other... -
Parallel Corpus (EN-LT-FR) of EUR-Lex Document Extracts That Include Terms wi...
Trilingual parallel corpus of EUR-Lex Document Extracts that include terms with colour names (black, white and grey). The size of the corpus is 23,198 words in English, 19,262... -
Parallel Corpus (EN-FR-LT) of EU Financial Documents (ELEXIS)
Parallel corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and... -
DSI-enriched ParaCrawl 9 en-es corpus
This is a derivative work based on Paracrawl release 9 English-Spanish (https://paracrawl.eu/). This version of the corpus includes a set of probabilities corresponding to the... -
Icelandic-English parallel corpus MaCoCu-is-en 2.0
The Icelandic-English parallel corpus MaCoCu-is-en 2.0 was built by crawling the “.is” internet top-level domain in 2021, extending the crawl dynamically to other domains as... -
Serbian-English parallel corpus MaCoCu-sr-en 1.0
The Serbian-English parallel corpus MaCoCu-sr-en 1.0 was built by crawling the “.rs” and “.срб” internet top-level domains in 2021 and 2022, extending the crawl dynamically to... -
Icelandic-English parallel corpus MaCoCu-is-en 1.0
The Icelandic-English parallel corpus MaCoCu-is-en 1.0 was built by crawling the ".is" internet top-level domain in 2021, extending the crawl dynamically to other domains as... -
Parallel corpus EN-SL RSDO4 1.0
The RSDO4 parallel corpus of English-Slovene and Slovene-English translation pairs was collected as part of work package 4 of the Slovene in the Digital Environment project. It... -
Croatian-English parallel corpus MaCoCu-hr-en 2.0
The Croatian-English parallel corpus MaCoCu-hr-en 2.0 was built by crawling the “.hr” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other... -
Parallel corpus of idiomatic text ParaDiom 1.0
ParaDiom is a parallel corpus with sentences sampled from existing corpora. The corpus contains 1,000 Slovene sentences with their English translation and 1,000 English... -
Parallel corpus EN-SL RSDO4 2.0
The RSDO4 parallel corpus of English-Slovene and Slovene-English translation pairs was collected as part of work package 4 of the Slovene in the Digital Environment project. It... -
DSI-enriched ParaCrawl 9 en-nl corpus
This is a derivative work based on Paracrawl release 9 English-Dutch (https://paracrawl.eu/). This version of the corpus includes a set of probabilities corresponding to the...