Welcome to Incels.is - Involuntary Celibate Forum

Welcome! This is a forum for involuntary celibates: people who lack a significant other. Are you lonely and wish you had someone in your life? You're not alone! Join our forum and talk to people just like you.

Serious Tracing Journeys of Violent Speech in Incel Posts with GPT-4-Enhanced Annotations

InMemoriam

InMemoriam

Make Paragon Glowie Again
★★★★★
Joined
Feb 19, 2022
Posts
8,317

Abstract.​

This study investigates the prevalence of violent language on incels.is. It evaluates GPT models (GPT-3.5 and GPT-4) for content analysis in social sciences, focusing on the impact of varying prompts and batch sizes on coding quality for the detection of violent speech. We scraped over 6.96.9M6.9 italic_M posts from incels.is and categorized a random sample into non-violent, explicitly violent, and implicitly violent content. Two human coders annotated 30283{,}0283 , 028 posts, which we used to tune and evaluate GPT-3.5 and GPT-4 models across different prompts and batch sizes regarding coding reliability. The best-performing GPT-4 model annotated an additional 3000030{,}00030 , 000 posts for further analysis.

Our findings indicate an overall increase in violent speech over time on incels.is, both at the community and individual level, particularly among more engaged users. While directed violent language decreases, non-directed violent language increases, and self-harm content shows a decline, especially after 2.5 years of user activity. We find substantial agreement between both human coders (.65\kappa=.65italic_κ = .65), while the best GPT-4 model yields good agreement with both human coders (0.54\kappa=0.54italic_κ = 0.54 for Human A and 0.62\kappa=0.62italic_κ = 0.62 for Human B). Weighted and macro F1 scores further support this alignment.

Overall, this research provides practical means for accurately identifying violent language at a large scale that can aid content moderation and facilitate next-step research into the causal mechanism and potential mitigations of violent expression and radicalization in communities like incels.is.

License: CC BY 4.0
arXiv:2401.02001v1 [cs.SI] 03 Jan 2024

Close to Human-Level Agreement: Tracing Journeys of Violent Speech in Incel Posts with GPT-4-Enhanced Annotations​

Daniel Matter[email protected]0000-0003-4501-5612Technical University of MunichRichard-Wagner-Str. 1MunichGermanyMiriam Schirmer[email protected]0000-0002-6593-3974Technical University of MunichRichard-Wagner-Str. 1MunichGermanyNir Grinberg[email protected]0000-0002-1277-894XBen-Gurion University of the NegevBeershebaIsraelJürgen Pfeffer[email protected]0000-0002-1677-150X[email protected]Technical University of MunichRichard-Wagner-Str. 1MunichGermany

Abstract.​

This study investigates the prevalence of violent language on incels.is. It evaluates GPT models (GPT-3.5 and GPT-4) for content analysis in social sciences, focusing on the impact of varying prompts and batch sizes on coding quality for the detection of violent speech. We scraped over 6.96.9M6.9 italic_M posts from incels.is and categorized a random sample into non-violent, explicitly violent, and implicitly violent content. Two human coders annotated 30283{,}0283 , 028 posts, which we used to tune and evaluate GPT-3.5 and GPT-4 models across different prompts and batch sizes regarding coding reliability. The best-performing GPT-4 model annotated an additional 3000030{,}00030 , 000 posts for further analysis.
Our findings indicate an overall increase in violent speech over time on incels.is, both at the community and individual level, particularly among more engaged users. While directed violent language decreases, non-directed violent language increases, and self-harm content shows a decline, especially after 2.5 years of user activity. We find substantial agreement between both human coders (.65\kappa=.65italic_κ = .65), while the best GPT-4 model yields good agreement with both human coders (0.54\kappa=0.54italic_κ = 0.54 for Human A and 0.62\kappa=0.62italic_κ = 0.62 for Human B). Weighted and macro F1 scores further support this alignment.
Overall, this research provides practical means for accurately identifying violent language at a large scale that can aid content moderation and facilitate next-step research into the causal mechanism and potential mitigations of violent expression and radicalization in communities like incels.is.

1.Introduction​

The term “Incels” (“Involuntary Celibates”) refers to heterosexual men who, despite yearning for sexual and intimate relationships, find themselves unable to engage in such interactions. The online community of Incels has been subject to increasing attention from both media and academic research, mainly due to its connections to real-world violence (Hoffman et al., 2020). Scrutiny intensified after more than 50 individuals’ deaths have been linked to Incel-related incidents since 2014 (Lindsay, 2022). The rising trend of Incel-related violence underscores societal risks posed by the views propagated within the community, especially those regarding women. In response, various strategic and administrative measures have been implemented. Notably, the social media platform Reddit officially banned the largest Incel subreddit r/incel for inciting violence against women (Hauser, 2017). The Centre for Research and Evidence on Security Threats has emphasized the community’s violent misogynistic tendencies, classifying its ideology as extremist (Brace, 2021). Similarly, the Texas Department of Public Safety has labeled Incels as an ”emerging domestic terrorism threat” (Texas Department of Public Safety, 2020).
Incels mainly congregate on online platforms. Within these forums, discussions frequently revolve around their feelings of inferiority compared to male individuals known as “Chads,” who are often portrayed as highly attractive and socially successful men who seemingly effortlessly attract romantic partners. Consequently, these forums often serve as outlets for expressing frustration and resentment, usually related to physical attractiveness, societal norms, and women’s perceived preferences in partner selection. These discussions serve as an outlet for toxic ideologies and can reinforce patterns of blame and victimization that potentially contribute to a volatile atmosphere (Hoffman et al., 2020; O’Malley et al., 2022).
As public attention on Incels has grown, researchers have also begun to study the community more comprehensively, focusing on abusive language within Incel online communities (Jaki et al., 2019), Incels as a political movement (O’Donnell and Shor, 2022), or mental health aspects of Incel community members (Broyd et al., 2023). Despite the widespread public perception that links Incels predominantly with violence, several studies found that topics discussed in Incel online communities cover a broad range of subjects that are not necessarily violence-related, e.g., discussions on high school and college courses and online gaming (Mountford, 2018). Nevertheless, the prevalence of abusive and discriminatory language in Incel forums remains a significant concern as it perpetuates a hostile environment that can both isolate members further and potentially escalate into real-world actions.
Although existing research has shed light on essential facets of violence within Incel forums, a comprehensive, computational analysis that classifies various forms of violence expressed in Incel posts remains lacking. Additionally, to the best of our knowledge, no studies focus on trajectories of violent content on a user level.
Understanding violence within the Incel community at the user level is crucial for several reasons. It can provide insights into individual motivations, triggers, and behavioral patterns and reveal the extent of variance within the community, such as what proportion of users engage in violent rhetoric or actions. This nuanced approach could facilitate more targeted and effective intervention and prevention strategies.
Scope of this study. This paper seeks to identify the prevalence of violent content and its evolution over time in the largest Incel forum, incels.is. We initially perform manual labeling on a subset of the data to establish a baseline and ensure precise categorization for our violence typology. We then employ OpenAI’s GPT-3.5 and GPT-4 APIs to classify a larger sample of violence identified in online forum threads, thereby enabling a comprehensive annotation of our dataset. We incorporate the human baseline to assess the performance and ensure the accuracy of the categorization process and discuss different experimental setups and challenges associated with annotating Incel posts. We then examine how the violent content within the forum evolves for each violence category, looking at the overall share of violent posts within the forum and for individual users within different time frames.
Our main contributions can be summarized as follows:

  • We find that percent15.715.7\%15.7 % of the posts analyzed in our study (absentN=italic_N = 3302833{,}02833 , 028) exhibit violent speech with a subtle but statistically significant increase over time.

  • We report a slight decrease in the use of violent language after users have been inactive for a prolonged period.

  • We perform experiments for annotating data in complex and time-consuming labeling tasks. We present an accessible, resource-efficient, yet accurate state-of-the-art method to enhance data annotation, using manual annotation in combination with GPT-4.

  • In particular, we study the effect of batching on the performance of GPT-4 and find that the batch size significantly affects the model’s sensitivity.
Within computational social science (Lazer et al., 2009), a diverse body of research has explored the multifaceted landscape of incel posts and forums. Natural language processing techniques have been harnessed to analyze the linguistic characteristics of incel discourse, uncovering patterns of extreme negativity, misogyny, and self-victimization. Sentiment analysis, for instance, has illuminated the prevalence of hostile sentiments in these online spaces (Jaki et al., 2019; Pelzer et al., 2021), while topic modeling has unveiled recurrent themes and narratives driving discussions (Baele et al., 2021; Jelodar and Frank, 2021; Mountford, 2018). These studies offer invaluable insights into the dynamics of Incel online communication and serve as a valuable foundation for more comprehensive research to fully understand the complexities of these communities.

2.1.Incels and Violence​

Due to misogynistic and discriminating attitudes represented in Incel forums, research focusing on violent content constitutes the largest part of academic studies related to this community. Pelzer et al. (2021), for instance, conducted an analysis of toxic language across three major Incel forums, employing a fine-tuned BERT model trained on approximately 2000020{,}00020 , 000 samples from various hate speech and toxic language datasets. Their research identified seven primary targets of toxicity: women, society, incels, self-hatred, ethnicities, forum users, and others. According to their analysis, expressions of animosity towards women emerged as the most prevalent form of toxic language (see Jaki et al. (2019) for a similar approach). On a broader level, Baele et al. (2021) employed a mix of qualitative and quantitative content analysis to explore the Incel ideology prevalent in an online community linked to recent acts of politically motivated violence. The authors emphasize that this particular community occupies a unique and extreme position within the broader misogynistic movement, featuring elements that not only encourage self-destructive behaviors but also have the potential to incite some members to commit targeted acts of violence against women, romantically successful men, or other societal symbols that represent perceived inequities.

2.2.Categorizing Violent Language Online​

Effectively approaching harmful language requires a nuanced understanding of the diverse forms it takes online, encompassing elements such as “abusive language”, “hate speech”, and “toxic language” (Nobata et al., 2016; Schmidt and Wiegand, 2017). Due to their overlapping characteristics and varying degrees of subtlety and intensity, distinguishing between these types of content poses a great challenge. In addressing this complexity, Davidson et al. (2017) define hate speech as ”language that is used to express hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group.” Within the research community, this definition is further extended to include direct attacks against individuals or groups based on their race, ethnicity, or sex, which may manifest as offensive and toxic language (Salminen et al., 2020).
While hate speech has established itself as a comprehensive category to describe harmful language online, the landscape of hateful language phenomena spans a broad spectrum. Current research frequently focuses on specific subfields, e.g., toxic language, resulting in a fragmented picture marked by a diversity of definitions (Caselli et al., 2020; Waseem et al., 2017). What unites these definitions is their reliance on verbal violence as a fundamental element in characterizing various forms of harmful language. Verbal violence, in this context, encompasses language that is inherently aggressive, demeaning, or derogatory, with the intent to inflict harm or perpetuate discrimination (Kansok-Dusche et al., 2023; Soral et al., 2018; Waseem et al., 2017). Building on this foundation, we adopt the terminology of “violent language” as it aptly encapsulates the intrinsic aggressive and harmful nature inherent in such expressions. To operationalize violent language, Waseem et al. (2017) have developed an elaborate categorization of violent language online. This categorization distinguishes between explicit and implicit violence, as well as directed and undirected forms of violence in online contexts and will serve as the fundamental concept guiding the operationalization of violent speech in this paper (see 3.1). By addressing various degrees of violence, this concept encompasses language employed to offend, threaten, or explicitly indicate an intention to inflict emotional or physical harm upon an individual or group.

2.3.Classification of Violent Language with Language Models​

Supervised classification algorithms have proven successful in detecting hateful language in online posts. Transformer-based models like HateBERT, designed to find such language, have outperformed general BERT versions in English (Caselli et al., 2020). While HateBERT has proven effective in recognizing hateful language, its adaptability to diverse datasets depends on the compatibility of annotated phenomena. Additionally, although these models exhibit proficiency in discovering broad patterns of hateful language, they are limited in discerning specific layers or categories, such as explicit or implicit forms of violence. The efficiency of the training process is further contingent on the volume of data, introducing potential challenges in terms of time and cost.
Large Language Models (LLMs) present a promising alternative to make data annotation more efficient and accessible. While specialized models like HateBERT often demand significant resources for training and fine-tuning on task-specific datasets, pre-trained LLMs might offer a more flexible, cost-effective solution without requiring additional, expensive transfer learning. Recent research has found that using LLMs, particularly OpenAIs GPT variants, to augment small labeled datasets with synthetic data is effective in low-resource settings and for identifying rare classes (Møller et al., 2023). Further, Gilardi et al. (2023) found that GPT-3.5 outperforms crowd workers over a range of annotation tasks, demonstrating the potential of LLMs to drastically increase the efficiency of text classification. The efficacy of employing GPT-3.5 for text annotation, particularly in violent language, has been substantiated, revealing a robust accuracy of percent8080\%80 % compared to crowd workers in identifying harmful language online (Li et al., 2023). Even in more challenging annotation tasks, like detecting implicit hate speech, GPT-3.5 demonstrated a commendable accuracy by correctly classifying percent8080\%80 % of the provided samples (Huang et al., 2023).
While these results showcase the effectiveness of GPT-3.5 in-text annotation, there remains room for improvement, particularly in evaluating prompts and addressing the inherent challenges associated with establishing a definitive ground truth in complex classification tasks like violent language classification (Li et al., 2023).

2.4.User Behaviour in Incel Forums​

The rise of research on the Incel community has also shifted the spotlight on users within the “Incelverse”, driven by both qualitative and computational approaches. Scholars have embarked on demographic analyses, identifying prevalent characteristics, such as social isolation and prevailing beliefs within the Incelverse. A recent study on user characteristics in Incel forums analyzed users from three major Incel platforms using network analysis and community detection to determine their primary concerns and participation patterns. The findings suggest that users frequently interact with content related to mental health and relationships and show activity in other forums with hateful content (Stijelja and Mishara, 2023). Similarly, Pelzer et al. (2021) investigated the spread of toxic language across different incel platforms, revealing that the engagement with toxic language is associated with different subgroups or ideologies within the Incel communities. However, these studies have generally focused on smaller subsets of users and have not examined user behavior across the entirety of the incels.is forum. This gap in research is noteworthy, especially when broader studies indicate that content from hateful users tends to spread more quickly and reach a larger audience than non-hateful users (Mathew et al., 2019).

2.5.Summary​

The Incel community has become a subject of growing academic interest due to its complex interplay of extreme views and connections to real-world violence over the last few years. While existing studies have shed light on the linguistic and ideological aspects, most have not conducted a thorough user-level analysis across larger forums. Our study aims to bridge this research gap by categorizing and examining violent content within incels.is itself and at the individual user level. Using manual annotation in conjunction with GPT-4 for this task offers a cost-effective and flexible approach, given its pre-trained capabilities for understanding a wide range of textual nuances.
 

Users who are viewing this thread

shape1
shape2
shape3
shape4
shape5
shape6
Back
Top