1,492 datasets found

None: downloadable_files_count: 1 Repositories: CLARIN

Filter Results
  • Optimal reference translation of English-Czech WMT2020

    We define "optimal reference translation" as a translation thought to be the best possible that can be achieved by a team of human translators. Optimal reference translations...
  • ParaDi 2.0

    ParaDi 2.0. is a dictionary of single verb paraphrases of Czech verbal multiword expressions - light verb constructions and idiomatic verb constructions. Moreover, it provides...
  • Vystadial 2013 – scripts

    Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems....
  • czTenTen12 v9 subcorpus of problematic phenomena

    czTenTen12 v9 subcorpus containing problematic features (interlingual homographs, foreign proper names, named entities)
  • STYX 1.0

    STYX 1.0 is a corpus of Czech sentences selected from the Prague Dependency treebank. The criterion for including sentences into STYX was their suitability for practicing Czech...
  • Arabic Enclitics Lexicon

    An XML-based file containing all Arabic enclitics
  • Indonesian web corpus (idWac)

    Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd...
  • ForFun 1.0

    ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The...
  • Deep Universal Dependencies 2.5

    Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3105). It contains additional...
  • ParaDi: Dictionary of Paraphrases of Czech Complex Predicates with Light Verbs

    Dictionary of single verb paraphrases of Czech light verb constructions.
  • MADED

    Moroccan Dialect Electronic Dictionary (MDED) is an electronic lexicon containing almost 15000 MSA entries. They are written in Arabic letters and translated to Moroccan Arabic...
  • EvaldioData 1.0

    EvaldioData 1.0 is the language corpus of spoken performances by non-native speakers of Czech. It includes recordings capturing the oral part of the Czech Language Certificate...
  • Restaurant Reviews CZ ABSA corpus v2

    Restaurant Reviews CZ ABSA - 2.15k reviews with their related target and category The work done is described in the paper: https://doi.org/10.13053/CyS-20-3-2469
  • Optimal Reference Translations from English to Czech

    This corpus contains annotations of translation quality from English to Czech in seven categories on both segment- and document-level. There are 20 documents in total, each with...
  • EduPo: Analysis and Generation of Czech Poetry, v0.5

    A suite of tools for analysis and generation of Czech poetry. This is a snapshot of the public Github repository at https://github.com/ufal/edupo -- the beta-version of the tool...
  • Continuous Rating; Supplementary materials

    Collected data from Continuous Rating evaluation study; collected Continuous Rating scores and Questionnaires.
  • Open SDP 1.2

    The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data...
  • Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...
  • Diakorp v6: diachronic corpus of Czech

    Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not...
  • HamleDT 2.0

    HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a...