Quantifying Bias in a Face Verification System

Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo
{"title":"Quantifying Bias in a Face Verification System","authors":"Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo","doi":"10.3390/cmsf2022003006","DOIUrl":null,"url":null,"abstract":": Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous findings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be significant ( p < 0.05).","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/cmsf2022003006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

: Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous findings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be significant ( p < 0.05).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人脸验证系统中的量化偏差
机器学习模型执行人脸验证(FV)的各种高度重要的应用,如生物识别认证,人脸识别和监控。许多最先进的FV系统在不同人口群体中表现不平等,这一点通常被不评估特定人群表现的评估措施所忽视。部署有偏见的系统可能会对表现不佳的个人或群体造成严重伤害。我们探讨了几个公平的定义和指标,试图量化谷歌的FaceNet模型中的偏见。除了统计公平性指标外,我们还分析了由FV模型产生的聚类人脸嵌入。我们将人口统计群体的良好聚类嵌入(定义良好的密集聚类)与针对该群体的有偏差模型性能联系起来。我们提出的直觉是,FV系统在受保护的人口群体上表现不佳,因为它们对这些群体内部特征之间的差异不太敏感,聚类嵌入证明了这一点。我们展示了这种性能差异是如何由表示和聚集偏差共同造成的。白色面孔嵌入的死亡时间比其他种族组晚(W × A、W × I和W × B -t检验p < 0.05),说明白色面孔嵌入在嵌入空间中更多。其他种族群体的死亡高峰时间比白种人群体更高更早。白种人亚组的峰值更短更宽,这意味着h0死亡时间的变化更大(方差更大),而不是其他种族群体在0.8左右的一致峰值,方差更小。这表明,与其他种族相比,白人面孔分布在嵌入空间中的方差更大,这一趋势在种族群体的质心距离分布中没有出现,呈现出四个钟形密度图。因此,我们对(H 0)死亡时间的分析支持了之前的发现,即白种人群体与其他种族群体的聚集方式不同。我们注意到,尽管我们的p值表明这种差异可能是显著的(p < 0.05),但女性与男性面孔在H 0死亡时间上的不平等程度较小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Statement of Peer Review Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training Extracting Salient Facts from Company Reviews with Scarce Labels Dual Complementary Prototype Learning for Few-Shot Segmentation Super-Resolution for Brain MR Images from a Significantly Small Amount of Training Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1