The Corpus of scientific texts of contemporary Slovenian consists of 25 million words from scientific monographs and scientific papers written mainly between 2000 and 2023. It was designed as one of the resources of the project eSSKJ and corpus - towards state-of-the-art language data.
The corpus is linguistically annotated with the CLASSLA pipeline (https://github.com/clarinsi/classla/) at the levels of tokenization, sentence segmentation, lemmatization, MULTEXT-East v6 MSD-tags (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html), JOS dependency syntax (https://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf), and named entities (https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf).
The corpus is available in the CoNLL-U format, as well as vertical files for use with Sketch Engine type concordancers.