-
Optimal reference translation of English-Czech WMT2020
We define "optimal reference translation" as a translation thought to be the best possible that can be achieved by a team of human translators. Optimal reference translations... -
ParaDi 2.0
ParaDi 2.0. is a dictionary of single verb paraphrases of Czech verbal multiword expressions - light verb constructions and idiomatic verb constructions. Moreover, it provides... -
Vystadial 2013 – scripts
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems.... -
czTenTen12 v9 subcorpus of problematic phenomena
czTenTen12 v9 subcorpus containing problematic features (interlingual homographs, foreign proper names, named entities) -
STYX 1.0
STYX 1.0 is a corpus of Czech sentences selected from the Prague Dependency treebank. The criterion for including sentences into STYX was their suitability for practicing Czech... -
Arabic Enclitics Lexicon
An XML-based file containing all Arabic enclitics -
Indonesian web corpus (idWac)
Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd... -
ForFun 1.0
ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The... -
Deep Universal Dependencies 2.5
Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional... -
ParaDi: Dictionary of Paraphrases of Czech Complex Predicates with Light Verbs
Dictionary of single verb paraphrases of Czech light verb constructions. -
MADED
Moroccan Dialect Electronic Dictionary (MDED) is an electronic lexicon containing almost 15000 MSA entries. They are written in Arabic letters and translated to Moroccan Arabic... -
EvaldioData 1.0
EvaldioData 1.0 is the language corpus of spoken performances by non-native speakers of Czech. It includes recordings capturing the oral part of the Czech Language Certificate... -
Restaurant Reviews CZ ABSA corpus v2
Restaurant Reviews CZ ABSA - 2.15k reviews with their related target and category The work done is described in the paper: https://doi.org/10.13053/CyS-20-3-2469 -
Optimal Reference Translations from English to Czech
This corpus contains annotations of translation quality from English to Czech in seven categories on both segment- and document-level. There are 20 documents in total, each with... -
EduPo: Analysis and Generation of Czech Poetry, v0.5
A suite of tools for analysis and generation of Czech poetry. This is a snapshot of the public Github repository at https://github.com/ufal/edupo -- the beta-version of the tool... -
Continuous Rating; Supplementary materials
Collected data from Continuous Rating evaluation study; collected Continuous Rating scores and Questionnaires. -
Open SDP 1.2
The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data... -
Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ... -
Diakorp v6: diachronic corpus of Czech
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not... -
HamleDT 2.0
HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a...
