Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency.

Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media Pub Date : 2022-05-31 DOI:10.1609/icwsm.v16i1.19399

Austin van Loon, Salvatore Giorgi, Robb Willer, Johannes Eichstaedt

{"title":"Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency.","authors":"Austin van Loon, Salvatore Giorgi, Robb Willer, Johannes Eichstaedt","doi":"10.1609/icwsm.v16i1.19399","DOIUrl":null,"url":null,"abstract":"<p><p>The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"16 ","pages":"1419-1424"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147343/pdf/nihms-1842382.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v16i1.19399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

词嵌入中的负关联预测跨地区的反黑人偏见——但仅通过名字频率。

词嵌入关联测试(WEAT)是测量大文本语料库中对少数民族等社会群体的语言偏见的重要方法。它通过比较这些群体的原型词(例如，这些群体特有的名字)和属性词(例如，“愉快的”和“不愉快的”词)的语义相关性来做到这一点。我们表明，在大都市统计区域的水平上，从地理标记的社交媒体数据中得出的反黑人WEAT估计与种族敌意的几个衡量指标密切相关——即使在控制社会人口统计协变量的情况下也是如此。然而，我们也表明，这些相关性中的每一个都可以用第三个变量来解释:黑人名字在基础语料库中相对于白人名字的频率。这是因为词嵌入倾向于在估计的语义空间中将肯定(否定)词和频繁(罕见)词组合在一起。由于黑人名字在社交媒体上出现的频率与美国黑人在人口中的流行程度密切相关，这就导致了在美国黑人很少的地方，反黑人WEAT的估计高得令人难以置信。这表明，使用WEAT来衡量偏见的研究应该考虑术语频率，也表明了使用黑盒模型(如词嵌入)来研究人类认知和行为的潜在后果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media

自引率

0.00%

发文量