Corpus of Slovenian school texts is a lemmatized and POS-tagged specialized corpus, which includes 428 short school texts written primarily by primary-school students from 1st to 5th grades from 2017 to 2020. The corpus consists of approximately 95,000 tokens and was designed as one of the resources for the compilation of The School Dictionary of the Slovenian Language, which is being created as part of the project Franček Web Portal, Language Counselling for Slovene Teachers and School Dictionary of the Slovene Language. The corpus was lemmatized and POS-tagged with the Obeliks tagger (http://oznacevalnik.slovenscina.eu/Vsebine/Sl/ProgramskaOprema/Navodila.aspx) using JOS morphosyntactic descriptions. The corpus is written in XML and complies with TEI specifications as given in the CLARIN.SI customisation (https://github.com/clarinsi/TEI-schema).
Note that the corpus is intergrated with the CLARIN.SI concordancers, but the corpus available on the concordancers is much larger than the TEI sample available for download.