Dataset - B2FIND

Dataset of Authentic and Synthetic Slovene Language Errors DASSLE 1.0

DASSLE 1.0 (Dataset of Authentic and Synthetic Slovene Language Errors) comprises 7,385 manually prepared entries, each consisting of a Slovene sentence containing a single,...

Developmental corpus Šolar 3.0

The Developmental corpus Šolar consists of 5,485 texts written by students in Slovenian secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school (13-15),...

Developmental corpus Šolar 2.0

The Developmental corpus Šolar 2.0 consists of 5,485 texts written by students in Slovene secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school...

QT21 Data

Post-editing and MQM annotations produced by the QT21 project. As described in @InProceedings{specia-etal_MTSummit:2017, author = {Specia, Lucia and Kim Harris and...

Corpus of comma placement Vejica 1.0

A collection of sentences demonstrating and correcting comma usage. The sentences come from four sources: - KUST: a Slovene learner corpus,...

Error-annotated developmental corpus Šolar 2.0 Error

The corpus contains 2094 texts from the corpus Šolar 2.0 (http://hdl.handle.net/11356/1214), i.e. only those in which error annotations can be found. For each text, the...

Dataset for evaluation of Slovene spell- and grammar-checking tools Šolar-Eva...

Šolar-Eval is a specialized dataset designed for the evaluation of Slovene spell- and grammar-checking tools and methodologies. It encompasses 109 essays authored by Slovene...

Corpus of comma placement Vejica 1.3

A collection of sentences demonstrating and correcting comma usage. The sentences come from five sources: - KUST: a Slovene learner corpus,...

Learners' corpus Šolar 1.0

Šolar consists of 2,703 texts written by students in Slovene secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school (13-15), with a small percentage...

Frequency list of language problems from Šolar 3.0

The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems...

Post-edited and error annotated machine translation corpus PErr 1.0

The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their...

11 datasets found