51 datasets found

Keywords: Czech

Filter Results
  • Stereotypes and Discourse Connectors in Czech

    The purpose of the dataset is to test three variables: (i) the effect of argument order in Ale-constructions (But-constructions) “A, ale B” (“A, but B”): positive A, but...
  • Possessive Pronoun Preference

    The contribution includes the data frames and the R script (Markdown file) belonging to the paper "Morphological and Pragmatic Conditioning of Reflexivity in Possessive...
  • Czech HS Contracts Dataset (CHSC) 1.0

    Czech Contracts dataset was created as a part of the thesis Low-resource Text Classification (2021), A. Szabó, MFF UK. Contracts are obtained from the Hlídač Státu web portal....
  • Czech Models for Korektor 2

    The Czech models for Korektor 2 created by Michal Richter, 02 Feb 2013. The models can either perform spellchecking and grammarchecking, or only generate diacritical marks.
  • NomVallex 2.0

    NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on...
  • Czech Lexico-Semantic Database 0.1

    A lexicographical project, whose aim is to digitize and align two Czech onomasiological dictionaries (Haller 1969–77; Klégr 2007) in order to create an integrated digital...
  • Czech Models (MorfFlex CZ + PDT) for MorphoDiTa

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ and...
  • RobeCzech Base

    RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that...
  • FERNET-C5

    The FERNET-C5 is a monolingual BERT language representation model trained from scratch on the Czech Colossal Clean Crawled Corpus (C5) data - a Czech mutation of the English C4...
  • Khresmoi Query Translation Test Data 2.0

    This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans...
  • Processing of intraclausal garden-path structures in Czech

    Experimental materials, data and R scripts used in the paper "Garden-path sentences and the diversity of their (mis)representations" (Ceháková - Chromý, 2023).
  • Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...
  • Diakorp v6: diachronic corpus of Czech

    Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not...
  • Large Corpus of Czech Parliament Plenary Hearings

    We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximately 444 hours of speech data and corresponding text transcriptions. The whole...
  • Czech Models (MorfFlex CZ 2.0 + PDT-C 1.0) for MorphoDiTa 220710

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ 2.0,...
  • Czech Models (MorfFlex CZ 160310 + PDT 3.0) for MorphoDiTa 160310

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...
  • VALLEX 3.0

    VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX...
  • sqad 3.0

    Simple question answering database version 3 (SQAD v3) created from Czech Wikipedia. New version consits of 13477 records. Each record of SQAD consist of multiple files -...
  • CoNLL-based Extended Czech Named Entity Corpus 2.0

    This is a Czech Named Entity Corpus 2.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B22-8. The...
  • Diffusion of phonetic updates within phonological neighborhoods, ELOPE, Data

    Phonological neighborhood density is known to influence lexical access, speech production as well as perception processes. Lexical competition is thought to be the central...
You can also access this registry using the API (see API Docs).