When More Data Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

Yunyi Li, Maria De-Arteaga, M. Saar-Tsechansky
{"title":"When More Data Lead Us Astray: Active Data Acquisition in the Presence of Label Bias","authors":"Yunyi Li, Maria De-Arteaga, M. Saar-Tsechansky","doi":"10.1609/hcomp.v10i1.21994","DOIUrl":null,"url":null,"abstract":"An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/hcomp.v10i1.21994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
当更多的数据使我们误入歧途:标签偏差存在下的主动数据获取
对算法偏见风险的认识日益提高,推动了围绕偏见缓解战略的努力激增。绝大多数提出的方法属于两类之一:(1)对预测模型施加算法公平性约束,(2)收集额外的训练样本。最近,在这两个类别的交叉点上,提出在公平约束下主动学习的方法得到了发展。然而,提出的减轻偏倚策略通常忽略了观察到的标签中呈现的偏倚。在这项工作中,我们研究了标签偏差存在下主动数据收集策略的公平性考虑。我们首先概述了监督学习系统中不同类型的标签偏差。然后,我们的经验表明,当忽略标签偏见时,收集更多的数据会加剧偏见,并且在数据收集过程中施加依赖于观察到的标签的公平性约束可能无法解决问题。我们的研究结果说明了部署一个试图减轻单一类型偏见而忽视其他类型偏见的模型的意外后果,强调了明确区分公平感知算法旨在解决的偏见类型的重要性,并强调了在数据收集过程中忽视标签偏见的风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection Crowdsourced Clustering via Active Querying: Practical Algorithm with Theoretical Guarantees BackTrace: A Human-AI Collaborative Approach to Discovering Studio Backdrops in Historical Photographs Confidence Contours: Uncertainty-Aware Annotation for Medical Semantic Segmentation Humans Forgo Reward to Instill Fairness into AI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1