-
Catalan-English parallel corpus MaCoCu-ca-en 1.0
The Catalan-English parallel corpus MaCoCu-ca-en 1.0 was built by crawling the ".cat", ".es", ".ad", ".fr", ".it" and ".eu” internet top-level domain in 2022, extending the... -
Slovene-English parallel corpus MaCoCu-sl-en 2.0
The Slovene-English parallel corpus MaCoCu-sl-en 2.0 was built by crawling the “.si” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains... -
Montenegrin-English parallel corpus MaCoCu-cnr-en 1.0
The Montenegrin-English parallel corpus MaCoCu-cnr-en 1.0 was built by crawling the “.me” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other... -
Maltese-English parallel corpus MaCoCu-mt-en 2.0
The Maltese-English parallel corpus MaCoCu-mt-en 2.0 was built by crawling the ".mt" internet top-level domain in 2021, extending the crawl dynamically to other domains as well.... -
Macedonian-English parallel corpus MaCoCu-mk-en 2.0
The Macedonian-English parallel corpus MaCoCu-mk-en 2.0 was built by crawling the “.mk” and “.мкд” internet top-level domains in 2021, extending the crawl dynamically to other... -
Bosnian-English parallel corpus MaCoCu-bs-en 1.0
The Bosnian-English parallel corpus MaCoCu-bs-en 1.0 was built by crawling the “.ba” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains... -
Bulgarian-English parallel corpus MaCoCu-bg-en 1.0
The Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 was built by crawling the ".bg" and ".бг" internet top-level domains in 2021, extending the crawl dynamically to other... -
Slovene-English parallel corpus slenWaC 1.0
The slenWaC corpus version 1.0 consists of parallel Slovene-English texts crawled from the .si top-level domain for Slovenia. The corpus was built with Spidextor... -
Bulgarian-English parallel corpus MaCoCu-bg-en 2.0
The Bulgarian-English parallel corpus MaCoCu-bg-en 2.0 was built by crawling the “.bg” and “.бг” internet top-level domains in 2021, extending the crawl dynamically to other... -
Parallel corpus of idiomatic text ParaDiom 1.0
ParaDiom is a parallel corpus with sentences sampled from existing corpora. The corpus contains 1,000 Slovene sentences with their English translation and 1,000 English... -
Macedonian-English parallel corpus MaCoCu-mk-en 1.0
The Macedonian-English parallel corpus MaCoCu-mk-en 1.0 was built by crawling the ".mk" and ".мкд" internet top-level domains in 2021, extending the crawl dynamically to other... -
Greek-English parallel corpus MaCoCu-el-en 1.0
The Greek-English parallel corpus MaCoCu-el-en 1.0 was built by crawling the “.gr", ".ελ", ".cy" and ".eu" internet top-level domain in 2023, extending the crawl dynamically to... -
Croatian-English parallel corpus MaCoCu-hr-en 2.0
The Croatian-English parallel corpus MaCoCu-hr-en 2.0 was built by crawling the “.hr” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other... -
Serbian-English parallel corpus srenWaC 1.0
The srenWaC corpus consists of sentence aligned parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor... -
Parallel corpus EN-SL RSDO4 2.0
The RSDO4 parallel corpus of English-Slovene and Slovene-English translation pairs was collected as part of work package 4 of the Slovene in the Digital Environment project. It... -
Albanian-English parallel corpus MaCoCu-sq-en 1.0
The Albanian-English parallel corpus MaCoCu-sq-en 1.0 was built by crawling the “.al” internet top-level domain in 2022, extending the crawl dynamically to other domains as... -
Parallel Corpus (EN-LT-FR) of EUR-Lex Document Extracts That Include Terms wi...
Trilingual parallel corpus of EUR-Lex Document Extracts that include terms with colour names (black, white and grey). The size of the corpus is 23,198 words in English, 19,262... -
Parallel sense-annotated corpus ELEXIS-WSD 1.1
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.1 contains sentences for 10... -
Icelandic-English parallel corpus MaCoCu-is-en 1.0
The Icelandic-English parallel corpus MaCoCu-is-en 1.0 was built by crawling the ".is" internet top-level domain in 2021, extending the crawl dynamically to other domains as... -
Bilingual Corpus of Underground Mining (ELEXIS)
PodzemniRadovi-sr-en, dvojezični poravnati korpus radova iz oblasti rudarstva. Undeground-mining-sr-en: bilingual texts from the Underground Mining Engineering journal (55...