Hate speech dataset annotated for Portuguese

DOI

Portuguese Hate Speech Twitter Dataset is a dataset of Twitter messages manually annotated for Hate Speech using a hierarchical structure of classes. 5,668 messages were collected on Twitter, from 1,156 distinct users and classified as containing hate speech using a hierarchical structure of classes. A multiclass and multilabel approach was considered. Two different formats of the dataset are provided, plus the hierarchy of classes. The text of the tweets is omitted in this dataset due to the conditions and terms of the Twitter API.

Identifier
DOI https://doi.org/10.23728/b2share.9005efe2d6be4293b63c3cffd4cf193e
Source https://b2share.eudat.eu/records/9005efe2d6be4293b63c3cffd4cf193e
Metadata Access https://b2share.eudat.eu/api/oai2d?verb=GetRecord&metadataPrefix=eudatcore&identifier=oai:b2share.eudat.eu:b2rec/9005efe2d6be4293b63c3cffd4cf193e
Provenance
Creator Paula Fortuna
Publisher EUDAT B2SHARE; INESC TEC
Publication Year 2017
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Representation
Language English
Format json; rdf; csv; zip; txt
Size 1.3 MB; 8 files
Discipline Other