仇恨言语分类器学习规范的社会刻板印象

IF 4.2 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Transactions of the Association for Computational Linguistics Pub Date : 2023-03-01 DOI:10.1162/tacl_a_00550

A. Davani, M. Atari, Brendan Kennedy, Morteza Dehghani

{"title":"仇恨言语分类器学习规范的社会刻板印象","authors":"A. Davani, M. Atari, Brendan Kennedy, Morteza Dehghani","doi":"10.1162/tacl_a_00550","DOIUrl":null,"url":null,"abstract":"Social stereotypes negatively impact individuals’ judgments about different groups and may have a critical role in understanding language directed toward marginalized groups. Here, we assess the role of social stereotypes in the automated detection of hate speech in the English language by examining the impact of social stereotypes on annotation behaviors, annotated datasets, and hate speech classifiers. Specifically, we first investigate the impact of novice annotators’ stereotypes on their hate-speech-annotation behavior. Then, we examine the effect of normative stereotypes in language on the aggregated annotators’ judgments in a large annotated corpus. Finally, we demonstrate how normative stereotypes embedded in language resources are associated with systematic prediction errors in a hate-speech classifier. The results demonstrate that hate-speech classifiers reflect social stereotypes against marginalized groups, which can perpetuate social inequalities when propagated at scale. This framework, combining social-psychological and computational-linguistic methods, provides insights into sources of bias in hate-speech moderation, informing ongoing debates regarding machine learning fairness.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"300-319"},"PeriodicalIF":4.2000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Hate Speech Classifiers Learn Normative Social Stereotypes\",\"authors\":\"A. Davani, M. Atari, Brendan Kennedy, Morteza Dehghani\",\"doi\":\"10.1162/tacl_a_00550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social stereotypes negatively impact individuals’ judgments about different groups and may have a critical role in understanding language directed toward marginalized groups. Here, we assess the role of social stereotypes in the automated detection of hate speech in the English language by examining the impact of social stereotypes on annotation behaviors, annotated datasets, and hate speech classifiers. Specifically, we first investigate the impact of novice annotators’ stereotypes on their hate-speech-annotation behavior. Then, we examine the effect of normative stereotypes in language on the aggregated annotators’ judgments in a large annotated corpus. Finally, we demonstrate how normative stereotypes embedded in language resources are associated with systematic prediction errors in a hate-speech classifier. The results demonstrate that hate-speech classifiers reflect social stereotypes against marginalized groups, which can perpetuate social inequalities when propagated at scale. This framework, combining social-psychological and computational-linguistic methods, provides insights into sources of bias in hate-speech moderation, informing ongoing debates regarding machine learning fairness.\",\"PeriodicalId\":33559,\"journal\":{\"name\":\"Transactions of the Association for Computational Linguistics\",\"volume\":\"11 1\",\"pages\":\"300-319\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions of the Association for Computational Linguistics\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1162/tacl_a_00550\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1162/tacl_a_00550","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 10

摘要

社会刻板印象会对个人对不同群体的判断产生负面影响，并可能在理解针对边缘化群体的语言方面发挥关键作用。在这里，我们通过研究社会刻板印象对注释行为、注释数据集和仇恨言论分类器的影响，评估了社会刻板印象在英语仇恨言论自动检测中的作用。具体来说，我们首先调查了新手注释者的刻板印象对他们仇恨言论注释行为的影响。然后，我们考察了语言中的规范刻板印象对大型注释语料库中聚合注释者判断的影响。最后，我们展示了嵌入语言资源中的规范刻板印象如何与仇恨言论分类器中的系统预测错误相关联。研究结果表明，仇恨言论分类器反映了社会对边缘化群体的刻板印象，这种刻板印象在大规模传播时会使社会不平等现象长期存在。该框架结合了社会心理学和计算语言学方法，深入了解了仇恨言论节制中的偏见来源，为正在进行的关于机器学习公平性的辩论提供了信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hate Speech Classifiers Learn Normative Social Stereotypes

Social stereotypes negatively impact individuals’ judgments about different groups and may have a critical role in understanding language directed toward marginalized groups. Here, we assess the role of social stereotypes in the automated detection of hate speech in the English language by examining the impact of social stereotypes on annotation behaviors, annotated datasets, and hate speech classifiers. Specifically, we first investigate the impact of novice annotators’ stereotypes on their hate-speech-annotation behavior. Then, we examine the effect of normative stereotypes in language on the aggregated annotators’ judgments in a large annotated corpus. Finally, we demonstrate how normative stereotypes embedded in language resources are associated with systematic prediction errors in a hate-speech classifier. The results demonstrate that hate-speech classifiers reflect social stereotypes against marginalized groups, which can perpetuate social inequalities when propagated at scale. This framework, combining social-psychological and computational-linguistic methods, provides insights into sources of bias in hate-speech moderation, informing ongoing debates regarding machine learning fairness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transactions of the Association for Computational Linguistics Multiple-

CiteScore

32.60

自引率

4.60%

发文量

审稿时长

8 weeks

期刊介绍： The highly regarded quarterly journal Computational Linguistics has a companion journal called Transactions of the Association for Computational Linguistics. This open access journal publishes articles in all areas of natural language processing and is an important resource for academic and industry computational linguists, natural language processing experts, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, as well as linguists and philosophers. The journal disseminates work of vital relevance to these professionals on an annual basis.