-
Dataset of Authentic and Synthetic Slovene Language Errors DASSLE 1.0
DASSLE 1.0 (Dataset of Authentic and Synthetic Slovene Language Errors) comprises 7,385 manually prepared entries, each consisting of a Slovene sentence containing a single,... -
Developmental corpus Šolar 3.0
The Developmental corpus Šolar consists of 5,485 texts written by students in Slovenian secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school (13-15),... -
Developmental corpus Šolar 2.0
The Developmental corpus Šolar 2.0 consists of 5,485 texts written by students in Slovene secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school... -
QT21 Data
Post-editing and MQM annotations produced by the QT21 project. As described in @InProceedings{specia-etal_MTSummit:2017, author = {Specia, Lucia and Kim Harris and... -
Corpus of comma placement Vejica 1.0
A collection of sentences demonstrating and correcting comma usage. The sentences come from four sources: - KUST: a Slovene learner corpus,... -
Error-annotated developmental corpus Šolar 2.0 Error
The corpus contains 2094 texts from the corpus Šolar 2.0 (http://hdl.handle.net/11356/1214), i.e. only those in which error annotations can be found. For each text, the... -
Dataset for evaluation of Slovene spell- and grammar-checking tools Šolar-Eva...
Šolar-Eval is a specialized dataset designed for the evaluation of Slovene spell- and grammar-checking tools and methodologies. It encompasses 109 essays authored by Slovene... -
Corpus of comma placement Vejica 1.3
A collection of sentences demonstrating and correcting comma usage. The sentences come from five sources: - KUST: a Slovene learner corpus,... -
Learners' corpus Šolar 1.0
Šolar consists of 2,703 texts written by students in Slovene secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school (13-15), with a small percentage... -
Frequency list of language problems from Šolar 3.0
The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems... -
Post-edited and error annotated machine translation corpus PErr 1.0
The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their...
