Dataset - B2FIND

Corpus of spoken Slovenian ROG-Dialog 1.0

Corpus of spoken Slovenian ROG-Dialog consists of volunteered audio, recorded by students by asking their relatives or acquaintances to talk on record in their homes. The...

Spoken corpora of parliamentary debates ParlaSpeech 3.0

The ParlaSpeech corpora are built from the transcripts of parliamentary proceedings of Croatian, Serbian, Polish, and Czech parliaments available in the ParlaMint 4.0 corpus...

Manually sentiment annotated Slovenian news corpus SentiNews 1.0

Between 2 and 6 annotators independently sentiment annotated a stratified random sample of 10,427 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and...

Twitter sentiment for 15 European languages

The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages....

xLiMe Twitter Corpus XTC 1.0.1

The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total,...

The sentiment corpus of parliamentary debates ParlaSent-BCS v1.0

The dataset consists of mid-length sentences from the Bosnian, Croatian and Serbian parliamentary proceedings, annotated with a 6-level sentiment schema (defined below). The...

News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian a...

We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages...

Emoji Sentiment Ranking 1.0

A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages....

Sentiment Annotated Dataset of Croatian News

We present a collection of sentiment annotations for news articles (article links) in Croatian language. A set of 2025 news articles was gathered from 24sata, one of the leading...

The multilingual sentiment dataset of parliamentary debates ParlaSent 1.0

The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom,...

Automatically sentiment annotated Slovenian news corpus AutoSentiNews 1.0

The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and...

Slovene corpus for aspect-based sentiment analysis - SentiCoref 1.0

SentiCoref 1.0 corpus consists of 837 documents selected from SentiNews 1.0 corpus (http://hdl.handle.net/11356/1110). The documents were selected based on the number of...

EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news a...

This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from...

13 datasets found