Hate speech dataset annotated for Portuguese

Dataset

DOI

Portuguese Hate Speech Twitter Dataset is a dataset of Twitter messages manually annotated for Hate Speech using a hierarchical structure of classes. 5,668 messages were collected on Twitter, from 1,156 distinct users and classified as containing hate speech using a hierarchical structure of classes. A multiclass and multilabel approach was considered. Two different formats of the dataset are provided, plus the hierarchy of classes. The text of the tweets is omitted in this dataset due to the conditions and terms of the Twitter API.

Identifier
DOI	https://doi.org/10.23728/b2share.9005efe2d6be4293b63c3cffd4cf193e
Source	https://b2share.eudat.eu/records/9005efe2d6be4293b63c3cffd4cf193e
Metadata Access	https://b2share.eudat.eu/api/oai2d?verb=GetRecord&metadataPrefix=eudatcore&identifier=oai:b2share.eudat.eu:b2rec/9005efe2d6be4293b63c3cffd4cf193e

Provenance
Creator	Paula Fortuna
Publisher	EUDAT B2SHARE; INESC TEC
Publication Year	2017
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true

Representation
Language	English
Format	json; rdf; csv; zip; txt
Size	1.3 MB; 8 files
Discipline	Other