DALC - Dutch Abusive Language Corpus - Dataset

Dataset

DALC - Dutch Abusive Language Corpus

DOI

This repository contains the full text format of the Dutch Abusive Language Corpus (DALC), which is composed of tweets in Dutch. The corpus is structured as follows:

unique numeric id of the message full text message annonymised annotated data for abusive language annotated data for offensive language

A full description of the dataset and accompanying data statement is available at https://github.com/tommasoc80/DALC.

Twitter

Identifier
DOI	https://doi.org/10.34894/HOINL3
Metadata Access	https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/HOINL3

Provenance
Creator	Caselli, Tommaso ; Weultjes, Marieke; Schelhaas, Arjan; Leistra, Folkert ; van der Veen, Hylke; Robben, Menno; Timmerman, Gerben; Ruitenbeek, Waard; Zwart, Victor; van der Noord, Robin; Gnezdilov, Zhenja; Theodoridis, Dionysios
Publisher	DataverseNL
Contributor	Groningen Digital Competence Centre
Publication Year	2023
Rights	info:eu-repo/semantics/restrictedAccess
OpenAccess	false
Contact	Groningen Digital Competence Centre (University of Groningen)

Representation
Resource Type	Dataset
Format	text/csv; application/pdf
Size	226627; 579246; 1292778; 1009665; 80227; 267168; 819870
Version	1.2
Discipline	Humanities