-
Monitor corpus of Slovene Trendi 2025-11
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-11 covers the period from January... -
Monitor corpus of Slovene Trendi 2025-10
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-10 covers the period from January... -
CMC training corpus Janes-Tag 2.1
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
CMC training corpus Janes-Norm 1.2
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,... -
Slovene-Japanese Learner's Dictionary sloJa 1.1
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1... -
Slovene learner corpus KOST 2.1
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 10,590 texts (almost 1.4 million words) written by adult speakers for whom... -
Monitor corpus of Slovene Trendi 2025-09
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-09 covers the period from January... -
Slovene-Japanese Learner's Dictionary sloJa 1.0
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1... -
Slovene learner corpus KOST 2.0
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 8,347 texts (almost 1.3 million words) written by adult speakers for whom... -
Corpus of scientific texts of contemporary Slovenian KZB 1.0
The Corpus of scientific texts of contemporary Slovenian consists of 25 million words from scientific monographs and scientific papers written mainly between 2000 and 2023. It... -
Slovene learner corpus KOST 1.0
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,311 texts (just over 1 million words) written by adult speakers for whom... -
README.md
:unav
