Automatically sentiment annotated Slovenian news corpus AutoSentiNews 1.0

PID

The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. The submission contains 7 files: 5 of them, which are named after the news portal, contain raw news in txt format retrieved with R crawlers for five Slovenian web media 1.0 (http://hdl.handle.net/11356/1105). The file AutoSentiNews contains of 5 text files that contain 256,567 news articles annotated as positive, negative or neutral at the document level. 1,0427 of them were manually annotated (cf. Manually sentiment annotated Slovenian news corpus SentiNews 1.0, http://hdl.handle.net/11356/1110) and the remaining 246,140 news were annotated automatically. The file SloStopWords contains of 1,784 stop words for Slovene.

Identifier
PID http://hdl.handle.net/11356/1109
Related Identifier https://doi.org/10.1007/s10579-018-9413-3
Related Identifier https://github.com/19Joey85/Sentiment-annotated-news-corpus-and-sentiment-lexicon-in-Slovene/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1109
Provenance
Creator Bučar, Jože
Publisher Faculty of Information Studies Novo mesto
Publication Year 2017
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type corpus
Format text/plain; charset=utf-8; text/plain; application/zip; downloadable_files_count: 8
Discipline Linguistics