Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency.

Austin van Loon, Salvatore Giorgi, Robb Willer, Johannes Eichstaedt
{"title":"Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency.","authors":"Austin van Loon,&nbsp;Salvatore Giorgi,&nbsp;Robb Willer,&nbsp;Johannes Eichstaedt","doi":"10.1609/icwsm.v16i1.19399","DOIUrl":null,"url":null,"abstract":"<p><p>The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"16 ","pages":"1419-1424"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147343/pdf/nihms-1842382.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v16i1.19399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
词嵌入中的负关联预测跨地区的反黑人偏见——但仅通过名字频率。
词嵌入关联测试(WEAT)是测量大文本语料库中对少数民族等社会群体的语言偏见的重要方法。它通过比较这些群体的原型词(例如,这些群体特有的名字)和属性词(例如,“愉快的”和“不愉快的”词)的语义相关性来做到这一点。我们表明,在大都市统计区域的水平上,从地理标记的社交媒体数据中得出的反黑人WEAT估计与种族敌意的几个衡量指标密切相关——即使在控制社会人口统计协变量的情况下也是如此。然而,我们也表明,这些相关性中的每一个都可以用第三个变量来解释:黑人名字在基础语料库中相对于白人名字的频率。这是因为词嵌入倾向于在估计的语义空间中将肯定(否定)词和频繁(罕见)词组合在一起。由于黑人名字在社交媒体上出现的频率与美国黑人在人口中的流行程度密切相关,这就导致了在美国黑人很少的地方,反黑人WEAT的估计高得令人难以置信。这表明,使用WEAT来衡量偏见的研究应该考虑术语频率,也表明了使用黑盒模型(如词嵌入)来研究人类认知和行为的潜在后果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency. Correcting Sociodemographic Selection Biases for Population Prediction from Social Media. Classifying Minority Stress Disclosure on Social Media with Bidirectional Long Short-Term Memory. Classifying Minority Stress Disclosure on Social Media with Bidirectional Long Short-Term Memory Tweet Classification to Assist Human Moderation for Suicide Prevention.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1