-
South Slavic web corpus collection CLASSLA-web 2.0
The CLASSLA-web 2.0 collection is a large-scale, comparable set of web corpora covering all seven South Slavic languages: Slovenian, Croatian, Bosnian, Montenegrin, Serbian,... -
Multilingual training dataset for CAP policy topic classification ParlaCAP-train
The multilingual training dataset for CAP policy topic classification ParlaCAP-train is a collection of parliamentary speeches in 29 European languages, automatically annotated... -
Ontology of topics for Slovenian as a second and foreign language ONTEM 1.0
ONTEM 1.0 comprises 1,019 manually prepared entries, each consisting of information about the lemma, part-of-speech (following the MULTEXT-East tagset for Slovenian,...
