-
Multi-Dimensional Analysis of Czech
Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on... -
Czech word and MWE lists
This post contains word and MWE (multi-word expression) lists used for the operationalization of some of the linguistic features in the multi-dimensional analysis (MDA) of Czech... -
Replication data for: V-temporal adverbials in Slavic
The database includes 271 Russian examples and their equivalents in Ukrainian, Belarusian, Polish and Czech. The data were culled from the ParaSol parallel corpus (see... -
Parent-child conversations about motion events (Russian, Russian-German, Czech)
The dataset contains transcripts of parent-child communication over picture stimuli depicting motion events. The transcripts are partly-coded and transcribed in purpose of... -
Data from the project Sociolinguistic analysis of the use of prothetic /v/ in...
Data from the project Sociolinguistic analysis of the use of prothetic /v/ in Czech. Altogether, 28 893 tokens of words which may contain prothetic v- taken from sociolinguistic... -
Metonymy in Word-Formation: Russian, Czech, and Norwegian
Publication abstract: A foundational goal of cognitive linguistics is to explain linguistic phenomena in terms of general cognitive strategies rather than postulating an... -
Czech Models (MorfFlex CZ 2.0 + PDT-C 1.0) for MorphoDiTa 220710
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ 2.0,... -
Stereotypes and Discourse Connectors in Czech
The purpose of the dataset is to test three variables: (i) the effect of argument order in Ale-constructions (But-constructions) “A, ale B” (“A, but B”): positive A, but... -
Possessive Pronoun Preference
The contribution includes the data frames and the R script (Markdown file) belonging to the paper "Morphological and Pragmatic Conditioning of Reflexivity in Possessive... -
Czech HS Contracts Dataset (CHSC) 1.0
Czech Contracts dataset was created as a part of the thesis Low-resource Text Classification (2021), A. Szabó, MFF UK. Contracts are obtained from the Hlídač Státu web portal.... -
Czech Models for Korektor 2
The Czech models for Korektor 2 created by Michal Richter, 02 Feb 2013. The models can either perform spellchecking and grammarchecking, or only generate diacritical marks. -
NomVallex 2.0
NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on... -
Czech Lexico-Semantic Database 0.1
A lexicographical project, whose aim is to digitize and align two Czech onomasiological dictionaries (Haller 1969–77; Klégr 2007) in order to create an integrated digital... -
Czech Models (MorfFlex CZ + PDT) for MorphoDiTa
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ and... -
RobeCzech Base
RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that... -
FERNET-C5
The FERNET-C5 is a monolingual BERT language representation model trained from scratch on the Czech Colossal Clean Crawled Corpus (C5) data - a Czech mutation of the English C4... -
Khresmoi Query Translation Test Data 2.0
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans... -
Processing of intraclausal garden-path structures in Czech
Experimental materials, data and R scripts used in the paper "Garden-path sentences and the diversity of their (mis)representations" (Ceháková - Chromý, 2023). -
Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ... -
Diakorp v6: diachronic corpus of Czech
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not...
