Dataset - B2FIND

South Slavic web corpus collection CLASSLA-web 2.0

The CLASSLA-web 2.0 collection is a large-scale, comparable set of web corpora covering all seven South Slavic languages: Slovenian, Croatian, Bosnian, Montenegrin, Serbian,...
Multilingual training dataset for CAP policy topic classification ParlaCAP-train

The multilingual training dataset for CAP policy topic classification ParlaCAP-train is a collection of parliamentary speeches in 29 European languages, automatically annotated...
Ontology of topics for Slovenian as a second and foreign language ONTEM 1.0

ONTEM 1.0 comprises 1,019 manually prepared entries, each consisting of information about the lemma, part-of-speech (following the MULTEXT-East tagset for Slovenian,...

You can also access this registry using the API (see API Docs).

3 datasets found