Sentiment Annotated Dataset of Croatian News

PID

We present a collection of sentiment annotations for news articles (article links) in Croatian language. A set of 2025 news articles was gathered from 24sata, one of the leading media companies in Croatia with the highest circulation. 6 annotators annotated the articles on the document level using a five-level Likert scale (1—very negative, 2—negative, 3—neutral, 4—positive, and 5—very positive). The final sentiment of an instance was defined as the average of the sentiment scores given by the different annotators. An instance was labeled as negative, if the average of given scores was less than or equal to 2.4; neutral, if the average of given scores was between 2.4 and 3.6; or positive, if the average of given scores was greater than or equal to 3.6. The annotation guidelines correspond to the Slovenian sentiment-annotated collection of news SentiNews 1.0 (http://hdl.handle.net/11356/1110).

If you use the dataset, please cite the following paper (which contains also the details on the dataset creation, and on monolingual and cross-lingual sentiment classification experiments): Pelicon, A.; Pranjić, M.; Miljković, D.; Škrlj, B.; Pollak, S. Zero-Shot Learning for Cross-Lingual News Sentiment Classification. Appl. Sci. 2020, 10, 5993. https://doi.org/10.3390/app10175993

Identifier
PID http://hdl.handle.net/11356/1342
Related Identifier https://doi.org/10.3390/app10175993
Related Identifier http://embeddia.eu/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1342
Provenance
Creator Pelicon, Andraž; Pranjić, Marko; Miljković, Dragana; Škrlj, Blaž; Pollak, Senja
Publisher Jožef Stefan Institute
Publication Year 2020
Funding Reference info:eu-repo/grantAgreement/EC/H2020/825153
Rights Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0); https://creativecommons.org/licenses/by-nc-nd/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Croatian
Resource Type corpus
Format application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics