Slovene opinion lexicon KSS is based on the manually translated opinion lexicon of Hu & Liu (2004). The lexicon is updated with some positive and negative words typical for Slovenian language. There are three versions of the lexicon.
- Lexicon containing all word forms extended with Sloleks, a lexicon of Slovene word forms. It contains 90,620 entries, 62,941 negative word forms and 27,679 positive word forms.
- Lexicon containing only lemmas, containing 5,125 negative words and 1,911 positive words.
- The original version used in (Kadunc & Robnik-Šikonja, 2016), containing 6,687 negative entries and 2,645 positive entries.
Each version of the lexicon contains two files, one for negative and one for positive words in a text format, one word per line. The lexicon also contains some multi-word units where the individual words are joined with an underscore, e.g. "bolezenska_znamenja".
The KSS lexicon was developed as part of BSc Thesis (Kadunc, 2016) and empirically evaluated on a corpus of web commentaries about different topics (business, politics, sport and other topics) from 4 Slovene web portals (RtvSlo, 24ur, Finance, Reporter). That corpus is available from http://hdl.handle.net/11356/1115
References:
1. Minqing Hu in Bing Liu (2004). Mining opinion features in customer reviews. In Proceedings of AAAI Conference on Artificial Intelligence, vol. 4, pp. 755–760 http://www.aaai.org/Papers/AAAI/2004/AAAI04-119.pdf
2. Klemen Kadunc (2016). Določanje sentimenta slovenskim spletnim komentarjem s pomočjo strojnega učenja. Diplomsko delo. Univerza v Ljubljani, Fakulteta za računalništvo in informatiko (in Slovene). http://eprints.fri.uni-lj.si/3317/
3. Klemen Kadunc, Marko Robnik-Šikonja (2016). Analiza mnenj s pomočjo strojnega učenja in slovenskega leksikona sentimenta. Conference on Language Technologies & Digital Humanities, Ljubljana (in Slovene), http://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Kadunc-et-al_Analiza-mnenj-s-pomocjo-strojnega-ucenja.pdf