Detecting sexism in social media: an empirical analysis of linguistic patterns and strategies

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Intelligence Pub Date : 2024-09-03 DOI:10.1007/s10489-024-05795-2

Francisco Rodríguez-Sánchez, Jorge Carrillo-de-Albornoz, Laura Plaza

{"title":"Detecting sexism in social media: an empirical analysis of linguistic patterns and strategies","authors":"Francisco Rodríguez-Sánchez, Jorge Carrillo-de-Albornoz, Laura Plaza","doi":"10.1007/s10489-024-05795-2","DOIUrl":null,"url":null,"abstract":"<div><p>With the rise of social networks, there has been a marked increase in offensive content targeting women, ranging from overt acts of hatred to subtler, often overlooked forms of sexism. The EXIST (sEXism Identification in Social neTworks) competition, initiated in 2021, aimed to advance research in automatically identifying these forms of online sexism. However, the results revealed the multifaceted nature of sexism and emphasized the need for robust systems to detect and classify such content. In this study, we provide an extensive analysis of sexism, highlighting the characteristics and diverse manifestations of sexism across multiple languages on social networks. To achieve this objective, we conducted a detailed analysis of the EXIST dataset to evaluate its capacity to represent various types of sexism. Moreover, we analyzed the systems submitted to the EXIST competition to identify the most effective methodologies and resources for the automated detection of sexism. We employed statistical methods to discern textual patterns related to different categories of sexism, such as stereotyping, misogyny, and sexual violence. Additionally, we investigated linguistic variations in categories of sexism across different languages and platforms. Our results suggest that the EXIST dataset covers a broad spectrum of sexist expressions, from the explicit to the subtle. We observe significant differences in the portrayal of sexism across languages; English texts predominantly feature sexual connotations, whereas Spanish texts tend to reflect neosexism. Across both languages, objectification and misogyny prove to be the most challenging to detect, which is attributable to the varied vocabulary associated with these forms of sexism. Additionally, we demonstrate that models trained on platforms like Twitter can effectively identify sexist content on less-regulated platforms such as Gab. Building on these insights, we introduce a transformer-based system with data augmentation techniques that outperforms competition benchmarks. Our work contributes to the field by enhancing the understanding of online sexism and advancing the technological capabilities for its detection.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 21","pages":"10995 - 11019"},"PeriodicalIF":3.5000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05795-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With the rise of social networks, there has been a marked increase in offensive content targeting women, ranging from overt acts of hatred to subtler, often overlooked forms of sexism. The EXIST (sEXism Identification in Social neTworks) competition, initiated in 2021, aimed to advance research in automatically identifying these forms of online sexism. However, the results revealed the multifaceted nature of sexism and emphasized the need for robust systems to detect and classify such content. In this study, we provide an extensive analysis of sexism, highlighting the characteristics and diverse manifestations of sexism across multiple languages on social networks. To achieve this objective, we conducted a detailed analysis of the EXIST dataset to evaluate its capacity to represent various types of sexism. Moreover, we analyzed the systems submitted to the EXIST competition to identify the most effective methodologies and resources for the automated detection of sexism. We employed statistical methods to discern textual patterns related to different categories of sexism, such as stereotyping, misogyny, and sexual violence. Additionally, we investigated linguistic variations in categories of sexism across different languages and platforms. Our results suggest that the EXIST dataset covers a broad spectrum of sexist expressions, from the explicit to the subtle. We observe significant differences in the portrayal of sexism across languages; English texts predominantly feature sexual connotations, whereas Spanish texts tend to reflect neosexism. Across both languages, objectification and misogyny prove to be the most challenging to detect, which is attributable to the varied vocabulary associated with these forms of sexism. Additionally, we demonstrate that models trained on platforms like Twitter can effectively identify sexist content on less-regulated platforms such as Gab. Building on these insights, we introduce a transformer-based system with data augmentation techniques that outperforms competition benchmarks. Our work contributes to the field by enhancing the understanding of online sexism and advancing the technological capabilities for its detection.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

检测社交媒体中的性别歧视：对语言模式和策略的实证分析

随着社交网络的兴起，针对女性的攻击性内容明显增加，其中既有公开的仇恨行为，也有更微妙、往往被忽视的性别歧视形式。2021 年发起的 EXIST（社交网络中的性别歧视识别）竞赛旨在推动自动识别这些形式的网络性别歧视的研究。然而，比赛结果揭示了性别歧视的多面性，并强调需要强大的系统来检测和分类此类内容。在本研究中，我们对性别歧视进行了广泛分析，强调了性别歧视在社交网络多种语言中的特点和不同表现形式。为了实现这一目标，我们对 EXIST 数据集进行了详细分析，以评估其表现各种类型性别歧视的能力。此外，我们还分析了参加 EXIST 竞赛的系统，以确定自动检测性别歧视的最有效方法和资源。我们采用统计方法来识别与不同类别性别歧视相关的文本模式，如刻板印象、厌女症和性暴力。此外，我们还调查了不同语言和平台中性别歧视类别的语言差异。我们的研究结果表明，EXIST 数据集涵盖了广泛的性别歧视表达，从明确的到微妙的。我们观察到不同语言对性别歧视的描述存在显著差异；英语文本主要以性内涵为特征，而西班牙语文本则倾向于反映新性别歧视。在这两种语言中，物化和厌恶女性被证明是最难检测的，这归因于与这些形式的性别歧视相关的词汇多种多样。此外，我们还证明了在 Twitter 等平台上训练的模型可以有效识别 Gab 等监管较少平台上的性别歧视内容。在这些见解的基础上，我们介绍了一种基于转换器的系统，该系统采用了数据增强技术，性能优于竞争基准。我们的工作有助于加深人们对网络性别歧视的理解，并提高检测性别歧视的技术能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.