Dataset - B2FIND

PONK Linguistic Rules: Linguistic Rules and Metrics for PONK

Tool for linguistic analysis of Czech legal text readability, comprising of a set of linguistic rules and stylometric and readability metrics. It is a module of PONK, an...

LiFR-Lite

Corpus of Czech educational texts for readability studies, with paraphrases, measured reading comprehension, and a multi-annotator subjective rating of selected text features...

LiFR-Lite (2021-11-05)

Corpus of Czech educational texts for readability studies, with paraphrases, measured reading comprehension, and a multi-annotator subjective rating of selected text features...

LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Compr...

LiFR-Law is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the...

LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Compr...

LiFR-Law is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the...

KUK 1.0

KUK 1.0 is a corpus of Czech legal and administrative texts accompanied by extensive metadata information for automatic assessment of accessibility (comprehensibility or...

KUKY1.0

KUKY is a curated selection of 224 Czech administrative and legal documents for readability research, stored in two JSON files. The documents come partly from public databases...

ensiwiki-2011 dataset for readability modelling

The ensiwiki dataset contains Wikipedia pages sampled from Simple-English and regular English Wikipedia. For each Simple-English page, a paired page was sampled from the regular...

Reference List of Slovene Frequent Common Words

The reference list of Slovene most frequent common words was prepared by selecting vocabulary at the intersection of the most frequent 10,000 lemmas of four Slovene text...

9 datasets found