Dataset - B2FIND

Dataset of Authentic and Synthetic Slovene Language Errors DASSLE 1.0

DASSLE 1.0 (Dataset of Authentic and Synthetic Slovene Language Errors) comprises 7,385 manually prepared entries, each consisting of a Slovene sentence containing a single,...

CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and...

AKCES-GEC Grammatical Error Correction Dataset for Czech

AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format. Note that in comparison...

GECCC Grammar Error Correction Corpus for Czech

Grammar Error Correction Corpus for Czech (GECCC) consists of 83 058 sentences and covers four diverse domains, including essays written by native students, informal website...

GECCC Grammar Error Correction Corpus for Czech (2022-09-28)