55 datasets found

Keywords: Czech

Filter Results
  • Multi-Dimensional Analysis of Czech

    Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on...
  • Czech word and MWE lists

    This post contains word and MWE (multi-word expression) lists used for the operationalization of some of the linguistic features in the multi-dimensional analysis (MDA) of Czech...
  • Replication data for: V-temporal adverbials in Slavic

    The database includes 271 Russian examples and their equivalents in Ukrainian, Belarusian, Polish and Czech. The data were culled from the ParaSol parallel corpus (see...
  • Parent-child conversations about motion events (Russian, Russian-German, Czech)

    The dataset contains transcripts of parent-child communication over picture stimuli depicting motion events. The transcripts are partly-coded and transcribed in purpose of...
  • Data from the project Sociolinguistic analysis of the use of prothetic /v/ in...

    Data from the project Sociolinguistic analysis of the use of prothetic /v/ in Czech. Altogether, 28 893 tokens of words which may contain prothetic v- taken from sociolinguistic...
  • Metonymy in Word-Formation: Russian, Czech, and Norwegian

    Publication abstract: A foundational goal of cognitive linguistics is to explain linguistic phenomena in terms of general cognitive strategies rather than postulating an...
  • Czech Models (MorfFlex CZ 2.0 + PDT-C 1.0) for MorphoDiTa 220710

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ 2.0,...
  • Stereotypes and Discourse Connectors in Czech

    The purpose of the dataset is to test three variables: (i) the effect of argument order in Ale-constructions (But-constructions) “A, ale B” (“A, but B”): positive A, but...
  • Possessive Pronoun Preference

    The contribution includes the data frames and the R script (Markdown file) belonging to the paper "Morphological and Pragmatic Conditioning of Reflexivity in Possessive...
  • Czech HS Contracts Dataset (CHSC) 1.0

    Czech Contracts dataset was created as a part of the thesis Low-resource Text Classification (2021), A. Szabó, MFF UK. Contracts are obtained from the Hlídač Státu web portal....
  • Czech Models for Korektor 2

    The Czech models for Korektor 2 created by Michal Richter, 02 Feb 2013. The models can either perform spellchecking and grammarchecking, or only generate diacritical marks.
  • NomVallex 2.0

    NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on...
  • Czech Lexico-Semantic Database 0.1

    A lexicographical project, whose aim is to digitize and align two Czech onomasiological dictionaries (Haller 1969–77; Klégr 2007) in order to create an integrated digital...
  • Czech Models (MorfFlex CZ + PDT) for MorphoDiTa

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ and...
  • RobeCzech Base

    RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that...
  • FERNET-C5

    The FERNET-C5 is a monolingual BERT language representation model trained from scratch on the Czech Colossal Clean Crawled Corpus (C5) data - a Czech mutation of the English C4...
  • Khresmoi Query Translation Test Data 2.0

    This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans...
  • Processing of intraclausal garden-path structures in Czech

    Experimental materials, data and R scripts used in the paper "Garden-path sentences and the diversity of their (mis)representations" (Ceháková - Chromý, 2023).
  • Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115

    Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...
  • Diakorp v6: diachronic corpus of Czech

    Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not...
You can also access this registry using the API (see API Docs).