1,344 datasets found

Language: Slovene

Filter Results
  • Morphological lexicon Sloleks 2.0

    Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains...
  • Morphological lexicon Sloleks 1.2

    Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains...
  • Dataset of annotated collocation-distractor pairs COLLDIST

    The dataset contains 59,598 collocation-distractor pairs for 2,856 headwords. Distractor is defined as an incorrect answer/alternative to collocation, which can be similar to...
  • Terminological dictionary of papermaking

    This digital dictionary of papermaking was made on the basis of the printed edition, i.e. Marjeta Humar (ed.) Papirniški terminološki slovar. 1996. ZRC SAZU...
  • Dataset of annotated headword-synonym-distractor triplets SYNDIST

    The dataset contains 51,023 headword-synonym-distractor triplets for 5,000 headwords. Distractor is defined as an incorrect answer/alternative to synonym, which can be similar...
  • Tourism Corpus TURK 3.0

    The Tourism Corpus TURK 3.0 is a multilingual corpus of tourism-related texts in Slovenian, accompanied by some texts (about 6% of the corpus) in English, Italian and German....
  • Slovene instruction-following dataset for large language models GaMS-Instruct...

    GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of units of...
  • Parallel sense-annotated corpus ELEXIS-WSD 1.3

    ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.3 contains sentences for 10...
  • Corpus of spoken Slovenian ROG-Dialog 1.0

    Corpus of spoken Slovenian ROG-Dialog consists of volunteered audio, recorded by students by asking their relatives or acquaintances to talk on record in their homes. The...
  • Monitor corpus of Slovene Trendi 2025-11

    The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-11 covers the period from January...
  • Monitor corpus of Slovene Trendi 2025-10

    The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-10 covers the period from January...
  • Universal Dependencies 2.17 models for UDPipe 2 (2025-11-25)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 169 treebanks of 93 languages of Universal Depenencies 2.17 Treebanks, created solely using UD 2.17 data...
  • Corpus of conversational humor Krohot 1.0

    The KROHOT corpus consists of 10 audio recordings of private, spontaneous conversations between two or three speakers, with a total duration of 232 minutes. Most recordings were...
  • CMC training corpus Janes-Tag 2.1

    Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence...
  • CMC training corpus Janes-Norm 1.2

    Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,...
  • Slovensko javno mnenje 2014

    Slovensko javno mnenje (SJM) 2014 je raziskava, ki je bila opravljena kot del Evropske družboslovne raziskave 2014 (ESS 7). ESS je akademsko vodena mednarodna anketa, ki je bila...
  • Slovensko javno mnenje 2011/2

    Glavni sklop vprašalnika je sestavljen iz vprašanj Mednarodne raziskave vrednot, in sicer 6. vala izvedbe. Preostali del vprašalnika pa predstavlja Ogledalo javnega mnenja....
  • Slovensko javno mnenje 2011/1

    Slovensko javno mnenje 2011/1 je sklop raziskav, ki so del serije Slovensko javno mnenje. Je prva od dveh izvedb raziskave v letu 2011. Vprašalnik vključuje 4 vsebinske sklope,...
  • Slovene-Japanese Learner's Dictionary sloJa 1.1

    The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting the Japanese-Slovenian dictionary jaSlo 3.1...
  • Ontology of topics for Slovenian as a second and foreign language ONTEM 1.0

    ONTEM 1.0 comprises 1,019 manually prepared entries, each consisting of information about the lemma, part-of-speech (following the MULTEXT-East tagset for Slovenian,...
You can also access this registry using the API (see API Docs).