Thesaurus of Modern Slovene 2.2

PID

Thesaurus of Modern Slovene is the largest automatically generated open-access collection of Slovene synonyms. The current version 2.2 contains 102,068 keywords and 362,464 synonyms. Nearly 6,000 entries also contain antonyms. The Thesaurus includes two types of dictionary entries. Most of them are prepared entirely using automatic processes; however, an increasing number of them contain manually divided senses and categorized synonyms under relevant senses. Version 2.2 contains 7,465 sense-divided headwords, and 141 headwords contain sense-divided antonyms.

The original data for the Thesaurus was sourced from the data in two principal language resources: The Oxford®-DZS Comprehensive English-Slovenian Dictionary and the Gigafida 1.0 corpus of written Slovene. The links identified between synonyms were additionally confirmed using the Dictionary of Standard Slovenian Language (SSKJ). The data extraction and structure for the Thesaurus were based on the frequency and manner in which words co-occur in translation strings of the Oxford-DZS Dictionary. This information is the basis for discriminating between ‘core’ and ‘near’ synonyms, with ‘core’ synonyms exhibiting a greater connection to the keyword. In the following step, an approach combining balanced co-occurrence graphs and the Personal PageRank algorithm automatically divides the synonyms into subgroups and ranks them according to the degree of semantic relatedness to the keyword, as well as their frequency in language use. For the creation methodology, see Krek et al. (2017) in the provided references.

Identifier
PID http://hdl.handle.net/11356/2092
Related Identifier https://elex.link/elex2023/wp-content/uploads/82.pdf
Related Identifier https://elex.link/elex2017/wp-content/uploads/2017/09/paper05.pdf
Related Identifier http://hdl.handle.net/11356/1916
Related Identifier https://viri.cjvt.si/sopomenke/eng/about
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/2092
Provenance
Creator Krek, Simon; Laskowski, Cyprian; Robnik-Šikonja, Marko; Kosem, Iztok; Arhar Holdt, Špela; Gantar, Polona; Čibej, Jaka; Gorjanc, Vojko; Klemenc, Bojan; Dobrovoljc, Kaja; Pori, Eva; Roblek, Rebeka; Zgaga, Karolina; Kamenšek, Urška; Ponikvar, Primož; Šešet, Jure; Zaranšek, Petra
Publisher Centre for Language Resources and Technologies, University of Ljubljana
Publication Year 2026
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics