When More Data Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing Pub Date : 2022-10-14 DOI:10.1609/hcomp.v10i1.21994

Yunyi Li, Maria De-Arteaga, M. Saar-Tsechansky

{"title":"When More Data Lead Us Astray: Active Data Acquisition in the Presence of Label Bias","authors":"Yunyi Li, Maria De-Arteaga, M. Saar-Tsechansky","doi":"10.1609/hcomp.v10i1.21994","DOIUrl":null,"url":null,"abstract":"An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/hcomp.v10i1.21994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

当更多的数据使我们误入歧途:标签偏差存在下的主动数据获取

对算法偏见风险的认识日益提高，推动了围绕偏见缓解战略的努力激增。绝大多数提出的方法属于两类之一:(1)对预测模型施加算法公平性约束，(2)收集额外的训练样本。最近，在这两个类别的交叉点上，提出在公平约束下主动学习的方法得到了发展。然而，提出的减轻偏倚策略通常忽略了观察到的标签中呈现的偏倚。在这项工作中，我们研究了标签偏差存在下主动数据收集策略的公平性考虑。我们首先概述了监督学习系统中不同类型的标签偏差。然后，我们的经验表明，当忽略标签偏见时，收集更多的数据会加剧偏见，并且在数据收集过程中施加依赖于观察到的标签的公平性约束可能无法解决问题。我们的研究结果说明了部署一个试图减轻单一类型偏见而忽视其他类型偏见的模型的意外后果，强调了明确区分公平感知算法旨在解决的偏见类型的重要性，并强调了在数据收集过程中忽视标签偏见的风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing

自引率

0.00%

发文量