Dataset - B2FIND

MorfFlex CZ

Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for...

SQAD v2

Simple question answering database (SQAD) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging)...

VALLEX 4.0 (2021-02-12)

VALLEX 4.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses; each sense is by a gloss and examples. VALLEX 4.0...

Human Label Variation in Attribution and Discourse (Hlava AD)

Human Label Variation in Attribution and Discourse (Hlava AD) is a collection of commented multiple annotations (5 annotators) of inter-sentential explicit discourse relations...

Semantic annotation of noun/verb conversion in Czech

The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic...

Khresmoi Summary Translation Test Data 2.0

This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech,...

ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...

ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

Czech Named Entity Corpus 1.0

The presented Czech Named Entity Corpus 1.0 is the first publicly available corpus providing a large body of manually annotated named entities in Czech sentences, including a...

ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...

ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

NameTag 3 Czech CNEC 2.0 Model

This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/), trained on the Czech Named Entity Corpus 2.0...

MorfFlex CZ 2.0

MorfFlex CZ 2.0 is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of...

CoCzeFLA Chroma 2023.07

A new version of the previously published corpus Chroma wih morphological annotation. The version 2023.07 differs from 2023.04 in that it includes all seven children and it went...

MorfFlex CZ 160310

Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for...

MorfFlex CZ 2.1 (2024-12-23)

MorfFlex CZ 2.1 is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex CZ 2.1 is a part of the...

sqad 2.1

Simple question answering database version 2.1 (SQAD_v2.1) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization...

55 datasets found