51 datasets found

Keywords: Czech

Filter Results
  • Imperative Benefit Evaluation

    The contribution includes the data frame and the R script (Markdown file) belonging to the paper "Who Benefits from an Imperative? Assessment of Directives on a Benefit-Scale"...
  • NomVallex 2.5

    NomVallex is a manually annotated valency lexicon of Czech nouns and adjectives, adopting the theoretical framework of Functional Generative Description as its theoretical...
  • Czech Text Document Corpus v 2.0

    BASIC INFORMATION Czech Text Document Corpus v 2.0 is a collection of text documents for automatic document classification in Czech language. It is composed of the text...
  • Czech Legal Text Treebank

    The Czech Legal Text Treebank (CLTT) is a collection of 1133 manually annotated dependency trees. CLTT consists of two legal documents: The Accounting Act (563/1991 Coll., as...
  • Czech OOV Inflection Dataset

    Czech OOV Inflection Dataset is a Czech inflection dataset of nouns, focused on evaluation in out-of-vocabulary (OOV) conditions. It consists of two parts: a standard...
  • VALLEX 2.5

    The Valency Lexicon of Czech Verbs, Version 2.5 (VALLEX 2.5), is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description...
  • CWC2011

    Web corpus of Czech, created in 2011. Contains newspapers+magazines, discussions, blogs. See http://www.lrec-conf.org/proceedings/lrec2012/summaries/120.html for details.
  • CoNLL-based Extended Czech Named Entity Corpus 1.0

    This is a Czech Named Entity Corpus 1.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C. The...
  • CoCzeFLA Chroma 2023.04

    A new version of the previously published corpus Chroma. The version 2023.04 includes six children. Two transcripts (Julie20221, Klara30424) were removed since they did not meet...
  • VALLEX 4.5

    VALLEX 4.5 provides information on the valency structure (combinatorial potential) of Czech verbs in their particular senses (almost 4 700 verbs in more than 11 080 lexical...
  • MorfFlex CZ 161115

    Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for...
  • NomVallex I.

    The NomVallex I. lexicon describes valency of Czech deverbal nouns belonging to three semantic classes, i.e. Communication (dotaz 'question'), Mental Action (plán 'plan') and...
  • Khresmoi Summary Translation Test Data 1.1

    This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.
  • Czech Models (CNEC) for NameTag

    Czech models for NameTag, providing recognition of named entities. The models are trained on Czech Named Entity Corpus 2.0 and 1.1.
  • MorfFlex CZ

    Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for...
  • SQAD v2

    Simple question answering database (SQAD) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging)...
  • VALLEX 4.0 (2021-02-12)

    VALLEX 4.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses; each sense is by a gloss and examples. VALLEX 4.0...
  • Human Label Variation in Attribution and Discourse (Hlava AD)

    Human Label Variation in Attribution and Discourse (Hlava AD) is a collection of commented multiple annotations (5 annotators) of inter-sentential explicit discourse relations...
  • Semantic annotation of noun/verb conversion in Czech

    The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic...
  • Khresmoi Summary Translation Test Data 2.0

    This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech,...
You can also access this registry using the API (see API Docs).