-
DuST - Dutch Stance Twitter
This repository contains the full text format and the annotations for the Dutch Stance Twitter (DuST) dataset. The dataset is part of larger annotation initiative, the Dynamic... -
Slovene-Japanese Learner's Dictionary sloJa 1.1
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1... -
Slovene-Japanese Learner's Dictionary sloJa 1.0
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1... -
BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)
BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed,... -
NameTag 3 Multilingual CoNLL Model
This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/), trained jointly on several NE corpora: English CoNLL-2003,... -
Universal Segmentations 1.0 (UniSegments 1.0)
Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation... -
Multilingual corpus of juridical texts
International conventions and treaties arranged as a paralell corpus aligned on paragraph level -
Preamble 1.0
Preamble 1.0 is a multilingual annotated corpus of the preamble of the EU REGULATION 2020/2092 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL. The corpus consists of four... -
NameTag 3 Multilingual Model 250203
This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/). NameTag 3 is an open-source tool for both flat and nested named... -
UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 2
The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg.... -
Hindi Visual Genome 1.1
Data Hindi Visual Genome 1.1 is an updated version of Hindi Visual Genome 1.0. The update concerns primarily the text part of Hindi Visual Genome, fixing translation issues... -
UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 3
The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg.... -
Prague Czech-English Dependency Treebank 2.0 Coref
The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended... -
Large-Scale Colloquial Persian 0.5
"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a... -
UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 1
The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg.... -
UFAL Parallel Corpus of North Levantine 1.0
This is the first release of the UFAL Parallel Corpus of North Levantine, compiled by the Institute of Formal and Applied Linguistics (ÚFAL) at Charles University within the... -
QTLeap WSD/NED corpus
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are Q&A interactions from the... -
Hindi Visual Genome 1.0
Data Hindi Visual Genome 1.0, a multimodal dataset consisting of text and images suitable for English-to-Hindi multimodal machine translation task and multimodal research. We... -
PAWS
PAWS is a multi-lingual parallel treebank with coreference annotation. It consists of English texts from the Wall Street Journal translated into Czech, Russian and Polish. In... -
Czech Malach Cross-lingual Speech Retrieval Test Collection
The package contains Czech recordings of the Visual History Archive which consists of the interviews with the Holocaust survivors. The archive consists of audio recordings, four...
