News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian and Estonian SADEmma 1.0

PID

We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages except Estonian, we include pairs of source URL (where corresponding text can be found) and sentiment label.

For Estonian, we randomly sampled 100 articles from "Ekspress news article archive (in Estonian and Russian) 1.0" (http://hdl.handle.net/11356/1408).

The data is organized in Tab-Separated Values (TSV) format. For Serbian, Bosnian, Macedonian, and Albanian, the dataset contains two columns: sourceURL and sentiment. For Estonian, the dataset consists of three columns: text ID (from the CLARIN.SI reference above), body text, and sentiment label.

Identifier
PID http://hdl.handle.net/11356/1987
Related Identifier https://emma.ijs.si/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1987
Provenance
Creator Ivačič, Nikola; Pelicon, Andraž; Koloski, Boshko; Pollak, Senja; Purver, Matthew
Publisher Jožef Stefan Institute
Publication Year 2024
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Bosnian; Serbian; Macedonian; Albanian; Estonian
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 5
Discipline Linguistics