Context
This dataset was created for a Master's thesis in Digital Humanities by Ka Yee Suvini Lai (see Related Works for the thesis paper titled: Emotion Classification, Topic Modelling, and Discourse Evaluation of Audience Responses to SNL's Fast Fashion Sketch on Social Media: Leveraging RoBERTa, BERTopic and Discourse Analysis). The dataset consists of user comments from a SNL sketch titled 'Fast Fashion Ad', extracted across YouTube, Instagram and TikTok (n=4028). The dataset also contains emotion classification and topic modelling outputs from RoBERTa and BERTopic.
Technical details
The dataset consists of the following columns (with explanations in brackets):
comment_text (this column contains the user comments of the SNL sketch from Youtube, Instagram and Tiktok)
top_emotion (RoBERTa's output of the highest emotion score from the comment)
emotion_scores (RoBERTa's output of all the emotions and their scores from the comment)
topic (BERTopic's output for the topic number for the comment)
topic_label (BERTopic's output for the topic number and topic label for the comment)
probability (BERTopic's output for the probability of the topic from the comment)
This dataset is a .csv file and is interoperable across many digital tools. It is the aggregated results from the RoBERTa and BERTopic Python Pipelines (see Related Works for the source code).
Further details
To gain access to the dataset, please reach out to the author via email: ka.lai@tuwien.ac.at