Slovene sentiment lexicon KSS 1.1

PID

Slovene opinion lexicon KSS is based on the manually translated opinion lexicon of Hu & Liu (2004). The lexicon is updated with some positive and negative words typical for Slovenian language. There are three versions of the lexicon.

  1. Lexicon containing all word forms extended with Sloleks, a lexicon of Slovene word forms. It contains 90,620 entries, 62,941 negative word forms and 27,679 positive word forms.
  2. Lexicon containing only lemmas, containing 5,125 negative words and 1,911 positive words.
  3. The original version used in (Kadunc & Robnik-Šikonja, 2016), containing 6,687 negative entries and 2,645 positive entries.

Each version of the lexicon contains two files, one for negative and one for positive words in a text format, one word per line. The lexicon also contains some multi-word units where the individual words are joined with an underscore, e.g. "bolezenska_znamenja".

The KSS lexicon was developed as part of BSc Thesis (Kadunc, 2016) and empirically evaluated on a corpus of web commentaries about different topics (business, politics, sport and other topics) from 4 Slovene web portals (RtvSlo, 24ur, Finance, Reporter). That corpus is available from http://hdl.handle.net/11356/1115

References: 1. Minqing Hu in Bing Liu (2004). Mining opinion features in customer reviews. In Proceedings of AAAI Conference on Artificial Intelligence, vol. 4, pp. 755–760 http://www.aaai.org/Papers/AAAI/2004/AAAI04-119.pdf 2. Klemen Kadunc (2016). Določanje sentimenta slovenskim spletnim komentarjem s pomočjo strojnega učenja. Diplomsko delo. Univerza v Ljubljani, Fakulteta za računalništvo in informatiko (in Slovene). http://eprints.fri.uni-lj.si/3317/ 3. Klemen Kadunc, Marko Robnik-Šikonja (2016). Analiza mnenj s pomočjo strojnega učenja in slovenskega leksikona sentimenta. Conference on Language Technologies & Digital Humanities, Ljubljana (in Slovene), http://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Kadunc-et-al_Analiza-mnenj-s-pomocjo-strojnega-ucenja.pdf

Identifier
PID http://hdl.handle.net/11356/1097
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1097
Provenance
Creator Kadunc, Klemen; Robnik-Šikonja, Marko
Publisher Faculty of Computer and Information Science, University of Ljubljana
Publication Year 2017
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); PUB; https://creativecommons.org/licenses/by/4.0/
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format text/plain; application/pdf; text/plain; charset=utf-8; downloadable_files_count: 8
Discipline Linguistics