Dataset - B2FIND

Large Corpus of Czech Parliament Plenary Hearings

We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximately 444 hours of speech data and corresponding text transcriptions. The whole...

Czech Models (MorfFlex CZ 160310 + PDT 3.0) for MorphoDiTa 160310

Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex CZ...

VALLEX 3.0

VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX...

sqad 3.0

Simple question answering database version 3 (SQAD v3) created from Czech Wikipedia. New version consits of 13477 records. Each record of SQAD consist of multiple files -...

CoNLL-based Extended Czech Named Entity Corpus 2.0

This is a Czech Named Entity Corpus 2.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B22-8. The...

Diffusion of phonetic updates within phonological neighborhoods, ELOPE, Data

Phonological neighborhood density is known to influence lexical access, speech production as well as perception processes. Lexical competition is thought to be the central...

Imperative Benefit Evaluation

The contribution includes the data frame and the R script (Markdown file) belonging to the paper "Who Benefits from an Imperative? Assessment of Directives on a Benefit-Scale"...

NomVallex 2.5

NomVallex is a manually annotated valency lexicon of Czech nouns and adjectives, adopting the theoretical framework of Functional Generative Description as its theoretical...

Czech Text Document Corpus v 2.0

BASIC INFORMATION Czech Text Document Corpus v 2.0 is a collection of text documents for automatic document classification in Czech language. It is composed of the text...

Czech Legal Text Treebank

The Czech Legal Text Treebank (CLTT) is a collection of 1133 manually annotated dependency trees. CLTT consists of two legal documents: The Accounting Act (563/1991 Coll., as...

Czech OOV Inflection Dataset

Czech OOV Inflection Dataset is a Czech inflection dataset of nouns, focused on evaluation in out-of-vocabulary (OOV) conditions. It consists of two parts: a standard...

VALLEX 2.5

The Valency Lexicon of Czech Verbs, Version 2.5 (VALLEX 2.5), is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description...

CWC2011

Web corpus of Czech, created in 2011. Contains newspapers+magazines, discussions, blogs. See http://www.lrec-conf.org/proceedings/lrec2012/summaries/120.html for details.

CoNLL-based Extended Czech Named Entity Corpus 1.0

This is a Czech Named Entity Corpus 1.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C. The...

CoCzeFLA Chroma 2023.04

A new version of the previously published corpus Chroma. The version 2023.04 includes six children. Two transcripts (Julie20221, Klara30424) were removed since they did not meet...

VALLEX 4.5

VALLEX 4.5 provides information on the valency structure (combinatorial potential) of Czech verbs in their particular senses (almost 4 700 verbs in more than 11 080 lexical...

MorfFlex CZ 161115

Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for...

NomVallex I.

The NomVallex I. lexicon describes valency of Czech deverbal nouns belonging to three semantic classes, i.e. Communication (dotaz 'question'), Mental Action (plán 'plan') and...

Khresmoi Summary Translation Test Data 1.1

This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German.

Czech Models (CNEC) for NameTag

Czech models for NameTag, providing recognition of named entities. The models are trained on Czech Named Entity Corpus 2.0 and 1.1.

55 datasets found