InMemoriam
Celiacel
★★★★★
- Joined
- Feb 19, 2022
- Posts
- 8,733
Abstract.
This study investigates the prevalence of violent language on incels.is. It evaluates GPT models (GPT-3.5 and GPT-4) for content analysis in social sciences, focusing on the impact of varying prompts and batch sizes on coding quality for the detection of violent speech. We scraped over 6.96.9M6.9 italic_M posts from incels.is and categorized a random sample into non-violent, explicitly violent, and implicitly violent content. Two human coders annotated 30283{,}0283 , 028 posts, which we used to tune and evaluate GPT-3.5 and GPT-4 models across different prompts and batch sizes regarding coding reliability. The best-performing GPT-4 model annotated an additional 3000030{,}00030 , 000 posts for further analysis.Our findings indicate an overall increase in violent speech over time on incels.is, both at the community and individual level, particularly among more engaged users. While directed violent language decreases, non-directed violent language increases, and self-harm content shows a decline, especially after 2.5 years of user activity. We find substantial agreement between both human coders (.65\kappa=.65italic_κ = .65), while the best GPT-4 model yields good agreement with both human coders (0.54\kappa=0.54italic_κ = 0.54 for Human A and 0.62\kappa=0.62italic_κ = 0.62 for Human B). Weighted and macro F1 scores further support this alignment.
Overall, this research provides practical means for accurately identifying violent language at a large scale that can aid content moderation and facilitate next-step research into the causal mechanism and potential mitigations of violent expression and radicalization in communities like incels.is.
License: CC BY 4.0
arXiv:2401.02001v1 [cs.SI] 03 Jan 2024
Close to Human-Level Agreement: Tracing Journeys of Violent Speech in Incel Posts with GPT-4-Enhanced Annotations
Daniel Matter[email protected]0000-0003-4501-5612Technical University of MunichRichard-Wagner-Str. 1MunichGermanyMiriam Schirmer[email protected]0000-0002-6593-3974Technical University of MunichRichard-Wagner-Str. 1MunichGermanyNir Grinberg[email protected]0000-0002-1277-894XBen-Gurion University of the NegevBeershebaIsraelJürgen Pfeffer[email protected]0000-0002-1677-150X[email protected]Technical University of MunichRichard-Wagner-Str. 1MunichGermanyAbstract.
This study investigates the prevalence of violent language on incels.is. It evaluates GPT models (GPT-3.5 and GPT-4) for content analysis in social sciences, focusing on the impact of varying prompts and batch sizes on coding quality for the detection of violent speech. We scraped over 6.96.9M6.9 italic_M posts from incels.is and categorized a random sample into non-violent, explicitly violent, and implicitly violent content. Two human coders annotated 30283{,}0283 , 028 posts, which we used to tune and evaluate GPT-3.5 and GPT-4 models across different prompts and batch sizes regarding coding reliability. The best-performing GPT-4 model annotated an additional 3000030{,}00030 , 000 posts for further analysis.Our findings indicate an overall increase in violent speech over time on incels.is, both at the community and individual level, particularly among more engaged users. While directed violent language decreases, non-directed violent language increases, and self-harm content shows a decline, especially after 2.5 years of user activity. We find substantial agreement between both human coders (.65\kappa=.65italic_κ = .65), while the best GPT-4 model yields good agreement with both human coders (0.54\kappa=0.54italic_κ = 0.54 for Human A and 0.62\kappa=0.62italic_κ = 0.62 for Human B). Weighted and macro F1 scores further support this alignment.
Overall, this research provides practical means for accurately identifying violent language at a large scale that can aid content moderation and facilitate next-step research into the causal mechanism and potential mitigations of violent expression and radicalization in communities like incels.is.
1.Introduction
The term “Incels” (“Involuntary Celibates”) refers to heterosexual men who, despite yearning for sexual and intimate relationships, find themselves unable to engage in such interactions. The online community of Incels has been subject to increasing attention from both media and academic research, mainly due to its connections to real-world violence (Hoffman et al., 2020). Scrutiny intensified after more than 50 individuals’ deaths have been linked to Incel-related incidents since 2014 (Lindsay, 2022). The rising trend of Incel-related violence underscores societal risks posed by the views propagated within the community, especially those regarding women. In response, various strategic and administrative measures have been implemented. Notably, the social media platform Reddit officially banned the largest Incel subreddit r/incel for inciting violence against women (Hauser, 2017). The Centre for Research and Evidence on Security Threats has emphasized the community’s violent misogynistic tendencies, classifying its ideology as extremist (Brace, 2021). Similarly, the Texas Department of Public Safety has labeled Incels as an ”emerging domestic terrorism threat” (Texas Department of Public Safety, 2020).Incels mainly congregate on online platforms. Within these forums, discussions frequently revolve around their feelings of inferiority compared to male individuals known as “Chads,” who are often portrayed as highly attractive and socially successful men who seemingly effortlessly attract romantic partners. Consequently, these forums often serve as outlets for expressing frustration and resentment, usually related to physical attractiveness, societal norms, and women’s perceived preferences in partner selection. These discussions serve as an outlet for toxic ideologies and can reinforce patterns of blame and victimization that potentially contribute to a volatile atmosphere (Hoffman et al., 2020; O’Malley et al., 2022).
As public attention on Incels has grown, researchers have also begun to study the community more comprehensively, focusing on abusive language within Incel online communities (Jaki et al., 2019), Incels as a political movement (O’Donnell and Shor, 2022), or mental health aspects of Incel community members (Broyd et al., 2023). Despite the widespread public perception that links Incels predominantly with violence, several studies found that topics discussed in Incel online communities cover a broad range of subjects that are not necessarily violence-related, e.g., discussions on high school and college courses and online gaming (Mountford, 2018). Nevertheless, the prevalence of abusive and discriminatory language in Incel forums remains a significant concern as it perpetuates a hostile environment that can both isolate members further and potentially escalate into real-world actions.
Although existing research has shed light on essential facets of violence within Incel forums, a comprehensive, computational analysis that classifies various forms of violence expressed in Incel posts remains lacking. Additionally, to the best of our knowledge, no studies focus on trajectories of violent content on a user level.
Understanding violence within the Incel community at the user level is crucial for several reasons. It can provide insights into individual motivations, triggers, and behavioral patterns and reveal the extent of variance within the community, such as what proportion of users engage in violent rhetoric or actions. This nuanced approach could facilitate more targeted and effective intervention and prevention strategies.
Scope of this study. This paper seeks to identify the prevalence of violent content and its evolution over time in the largest Incel forum, incels.is. We initially perform manual labeling on a subset of the data to establish a baseline and ensure precise categorization for our violence typology. We then employ OpenAI’s GPT-3.5 and GPT-4 APIs to classify a larger sample of violence identified in online forum threads, thereby enabling a comprehensive annotation of our dataset. We incorporate the human baseline to assess the performance and ensure the accuracy of the categorization process and discuss different experimental setups and challenges associated with annotating Incel posts. We then examine how the violent content within the forum evolves for each violence category, looking at the overall share of violent posts within the forum and for individual users within different time frames.
Our main contributions can be summarized as follows:
- •
We find that percent15.715.7\%15.7 % of the posts analyzed in our study (absentN=italic_N = 3302833{,}02833 , 028) exhibit violent speech with a subtle but statistically significant increase over time. - •
We report a slight decrease in the use of violent language after users have been inactive for a prolonged period. - •
We perform experiments for annotating data in complex and time-consuming labeling tasks. We present an accessible, resource-efficient, yet accurate state-of-the-art method to enhance data annotation, using manual annotation in combination with GPT-4. - •
In particular, we study the effect of batching on the performance of GPT-4 and find that the batch size significantly affects the model’s sensitivity.
2.Related Work
Within computational social science (Lazer et al., 2009), a diverse body of research has explored the multifaceted landscape of incel posts and forums. Natural language processing techniques have been harnessed to analyze the linguistic characteristics of incel discourse, uncovering patterns of extreme negativity, misogyny, and self-victimization. Sentiment analysis, for instance, has illuminated the prevalence of hostile sentiments in these online spaces (Jaki et al., 2019; Pelzer et al., 2021), while topic modeling has unveiled recurrent themes and narratives driving discussions (Baele et al., 2021; Jelodar and Frank, 2021; Mountford, 2018). These studies offer invaluable insights into the dynamics of Incel online communication and serve as a valuable foundation for more comprehensive research to fully understand the complexities of these communities.2.1.Incels and Violence
Due to misogynistic and discriminating attitudes represented in Incel forums, research focusing on violent content constitutes the largest part of academic studies related to this community. Pelzer et al. (2021), for instance, conducted an analysis of toxic language across three major Incel forums, employing a fine-tuned BERT model trained on approximately 2000020{,}00020 , 000 samples from various hate speech and toxic language datasets. Their research identified seven primary targets of toxicity: women, society, incels, self-hatred, ethnicities, forum users, and others. According to their analysis, expressions of animosity towards women emerged as the most prevalent form of toxic language (see Jaki et al. (2019) for a similar approach). On a broader level, Baele et al. (2021) employed a mix of qualitative and quantitative content analysis to explore the Incel ideology prevalent in an online community linked to recent acts of politically motivated violence. The authors emphasize that this particular community occupies a unique and extreme position within the broader misogynistic movement, featuring elements that not only encourage self-destructive behaviors but also have the potential to incite some members to commit targeted acts of violence against women, romantically successful men, or other societal symbols that represent perceived inequities.2.2.Categorizing Violent Language Online
Effectively approaching harmful language requires a nuanced understanding of the diverse forms it takes online, encompassing elements such as “abusive language”, “hate speech”, and “toxic language” (Nobata et al., 2016; Schmidt and Wiegand, 2017). Due to their overlapping characteristics and varying degrees of subtlety and intensity, distinguishing between these types of content poses a great challenge. In addressing this complexity, Davidson et al. (2017) define hate speech as ”language that is used to express hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group.” Within the research community, this definition is further extended to include direct attacks against individuals or groups based on their race, ethnicity, or sex, which may manifest as offensive and toxic language (Salminen et al., 2020).While hate speech has established itself as a comprehensive category to describe harmful language online, the landscape of hateful language phenomena spans a broad spectrum. Current research frequently focuses on specific subfields, e.g., toxic language, resulting in a fragmented picture marked by a diversity of definitions (Caselli et al., 2020; Waseem et al., 2017). What unites these definitions is their reliance on verbal violence as a fundamental element in characterizing various forms of harmful language. Verbal violence, in this context, encompasses language that is inherently aggressive, demeaning, or derogatory, with the intent to inflict harm or perpetuate discrimination (Kansok-Dusche et al., 2023; Soral et al., 2018; Waseem et al., 2017). Building on this foundation, we adopt the terminology of “violent language” as it aptly encapsulates the intrinsic aggressive and harmful nature inherent in such expressions. To operationalize violent language, Waseem et al. (2017) have developed an elaborate categorization of violent language online. This categorization distinguishes between explicit and implicit violence, as well as directed and undirected forms of violence in online contexts and will serve as the fundamental concept guiding the operationalization of violent speech in this paper (see 3.1). By addressing various degrees of violence, this concept encompasses language employed to offend, threaten, or explicitly indicate an intention to inflict emotional or physical harm upon an individual or group.
2.3.Classification of Violent Language with Language Models
Supervised classification algorithms have proven successful in detecting hateful language in online posts. Transformer-based models like HateBERT, designed to find such language, have outperformed general BERT versions in English (Caselli et al., 2020). While HateBERT has proven effective in recognizing hateful language, its adaptability to diverse datasets depends on the compatibility of annotated phenomena. Additionally, although these models exhibit proficiency in discovering broad patterns of hateful language, they are limited in discerning specific layers or categories, such as explicit or implicit forms of violence. The efficiency of the training process is further contingent on the volume of data, introducing potential challenges in terms of time and cost.Large Language Models (LLMs) present a promising alternative to make data annotation more efficient and accessible. While specialized models like HateBERT often demand significant resources for training and fine-tuning on task-specific datasets, pre-trained LLMs might offer a more flexible, cost-effective solution without requiring additional, expensive transfer learning. Recent research has found that using LLMs, particularly OpenAIs GPT variants, to augment small labeled datasets with synthetic data is effective in low-resource settings and for identifying rare classes (Møller et al., 2023). Further, Gilardi et al. (2023) found that GPT-3.5 outperforms crowd workers over a range of annotation tasks, demonstrating the potential of LLMs to drastically increase the efficiency of text classification. The efficacy of employing GPT-3.5 for text annotation, particularly in violent language, has been substantiated, revealing a robust accuracy of percent8080\%80 % compared to crowd workers in identifying harmful language online (Li et al., 2023). Even in more challenging annotation tasks, like detecting implicit hate speech, GPT-3.5 demonstrated a commendable accuracy by correctly classifying percent8080\%80 % of the provided samples (Huang et al., 2023).
While these results showcase the effectiveness of GPT-3.5 in-text annotation, there remains room for improvement, particularly in evaluating prompts and addressing the inherent challenges associated with establishing a definitive ground truth in complex classification tasks like violent language classification (Li et al., 2023).