Online Hostility Towards UK MPs, 2019-2022

DOI

This is a dataset with tweets from X. Each tweet mentions one or more UK MPs from a subset selected for our study to give a diverse representation of political leanings. Each tweet is labelled for hostility and the identity characteristic it targets (religion, race, gender). Each annotator also provides a confidence score for each label. Three annotators annotate each tweet. Annotators are UK-based students from Computer Science and Politics.Toxic and abusive language threaten the integrity of public dialogue and democracy. Abusive language, such as taunts, slurs, racism, extremism, crudeness, provocation and disguise are generally considered offensive and insulting, has been linked to political polarisation and citizen apathy; the rise of terrorism and radicalisation; and cyberbullying. In response, governments worldwide have enacted strong laws against abusive language that leads to hatred, violence and criminal offences against a particular group. This includes legal obligations to moderate (i.e., detection, evaluation, and potential removal or deletion) online material containing hateful or illegal language in a timely manner; and social media companies have adopted even more stringent regulations in their terms of use. The last few years, however, have seen a significant surge in such abusive online behaviour, leaving governments, social media platforms, and individuals struggling to deal with the consequences. The responsible (i.e. effective, fair and unbiased) moderation of abusive language carries significant practical, cultural, and legal challenges. While current legislation and public outrage demand a swift response, we do not yet have effective human or technical processes that can address this need. The widespread deployment of human content moderators is costly and inadequate on many levels: the nature of the work is psychologically challenging, and significant efforts lag behind the deluge of data posted every second. At the same time, Artificial Intelligence (AI) solutions implemented to address abusive language have raised concerns about automated processes that affect fundamental human rights, such as freedom of expression, privacy and lack of corporate transparency. Tellingly, the first moves to censor Internet content focused on terms used by the LGBTQ community and AIDS activism. It is no surprise then that content moderation has been dubbed by industry and media as a "billion dollar problem." Thus, this project addresses the overarching question: how can AI be better deployed to foster democracy by integrating freedom of expression, commitments to human rights and multicultural participation in the protection against abuse? Our project takes on the difficult and urgent issue of detecting and countering abusive language through a novel approach to AI-enhanced moderation that combines computer science with social science and humanities expertise and methods. We focus on two constituencies infamous for toxicity: politicians and gamers. Politicians, because of their public role, are regularly subjected to abusive language. Online gaming and gaming spaces have been identified as private "recruitment sites"' for extreme political views and linked to off-line violent attacks. Specifically, our team will quantify the bias embedded within current content moderation systems that use rigid definitions or determinations of abusive language that may paradoxically create new forms of discrimination or bias based on identity, including sex, gender, ethnicity, culture, religion, political affiliation or other. We will offset these effects by producing more context-aware, dynamic systems of detection. Further, we will empower users by embedding these open source tools within strategies of democratic counter-speech and community-based care and response. Project results will be shared broadly through open access white papers, publications and other online materials with policy, academic, industry, community and public stakeholders. This project will engage and train the next generation of interdisciplinary scholars-crucial to the development of responsible AI. With its focus on robust AI methods for tackling online abuse in an effective and legally-compliant manner to the vigour of democratic societies, this research has wide-ranging implications and relevance for Canada and the UK.

We collected data from X using the Twitter API v1.1. The collector collected tweets based onUK MPs' user accounts (X handles). Four types of tweets were collected - tweets by the MPs, replies to their tweets, retweets by the MPs, and retweets of their tweets.

Identifier
DOI https://doi.org/10.5255/UKDA-SN-857099
Metadata Access https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=4f33d9b96241a383785840e4870bc34785b8af620e4f40d0e5d9fa3d72fd6b9d
Provenance
Creator Pandya, M, University of Sheffield; Bontcheva, K, University of Sheffield; Maynard, D, University of Sheffield
Publisher UK Data Service
Publication Year 2024
Funding Reference FIC
Rights Mugdha Pandya, University of Sheffield. Diana Maynard, University of Sheffield. Kalina Bontcheva, University of Sheffield; The Data Collection is available from an external repository. Access is available via Related Resources.
OpenAccess true
Representation
Resource Type Text
Discipline Computer Science; Computer Science, Electrical and System Engineering; Engineering Sciences
Spatial Coverage United Kingdom; United Kingdom