DALC - Dutch Abusive Language Corpus

DOI

This repository contains the full text format of the Dutch Abusive Language Corpus (DALC), which is composed of tweets in Dutch. The corpus is structured as follows:

unique numeric id of the message full text message annonymised annotated data for abusive language annotated data for offensive language

A full description of the dataset and accompanying data statement is available at https://github.com/tommasoc80/DALC.

Twitter

Identifier
DOI https://doi.org/10.34894/HOINL3
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/HOINL3
Provenance
Creator Caselli, Tommaso ORCID logo; Weultjes, Marieke; Schelhaas, Arjan; Leistra, Folkert ORCID logo; van der Veen, Hylke; Robben, Menno; Timmerman, Gerben; Ruitenbeek, Waard; Zwart, Victor; van der Noord, Robin; Gnezdilov, Zhenja; Theodoridis, Dionysios
Publisher DataverseNL
Contributor Groningen Digital Competence Centre
Publication Year 2023
Rights info:eu-repo/semantics/restrictedAccess
OpenAccess false
Contact Groningen Digital Competence Centre (University of Groningen)
Representation
Resource Type Dataset
Format text/csv; application/pdf
Size 226627; 579246; 1292778; 1009665; 80227; 267168; 819870
Version 1.2
Discipline Humanities