Measuring the Biases that Matter: The Ethical and Casual Foundations for Measures of Fairness in Algorithms

Proceedings of the Conference on Fairness, Accountability, and Transparency Pub Date : 2019-01-29 DOI:10.1145/3287560.3287573

Bruce Glymour, J. Herington

{"title":"Measuring the Biases that Matter: The Ethical and Casual Foundations for Measures of Fairness in Algorithms","authors":"Bruce Glymour, J. Herington","doi":"10.1145/3287560.3287573","DOIUrl":null,"url":null,"abstract":"Measures of algorithmic bias can be roughly classified into four categories, distinguished by the conditional probabilistic dependencies to which they are sensitive. First, measures of \"procedural bias\" diagnose bias when the score returned by an algorithm is probabilistically dependent on a sensitive class variable (e.g. race or sex). Second, measures of \"outcome bias\" capture probabilistic dependence between class variables and the outcome for each subject (e.g. parole granted or loan denied). Third, measures of \"behavior-relative error bias\" capture probabilistic dependence between class variables and the algorithmic score, conditional on target behaviors (e.g. recidivism or loan default). Fourth, measures of \"score-relative error bias\" capture probabilistic dependence between class variables and behavior, conditional on score. Several recent discussions have demonstrated a tradeoff between these different measures of algorithmic bias, and at least one recent paper has suggested conditions under which tradeoffs may be minimized. In this paper we use the machinery of causal graphical models to show that, under standard assumptions, the underlying causal relations among variables forces some tradeoffs. We delineate a number of normative considerations that are encoded in different measures of bias, with reference to the philosophical literature on the wrongfulness of disparate treatment and disparate impact. While both kinds of error bias are nominally motivated by concern to avoid disparate impact, we argue that consideration of causal structures shows that these measures are better understood as complicated and unreliable measures of procedural biases (i.e. disparate treatment). Moreover, while procedural bias is indicative of disparate treatment, we show that the measure of procedural bias one ought to adopt is dependent on the account of the wrongfulness of disparate treatment one endorses. Finally, given that neither score-relative nor behavior-relative measures of error bias capture the relevant normative considerations, we suggest that error bias proper is best measured by score-based measures of accuracy, such as the Brier score.","PeriodicalId":20573,"journal":{"name":"Proceedings of the Conference on Fairness, Accountability, and Transparency","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Fairness, Accountability, and Transparency","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3287560.3287573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

Abstract

Measures of algorithmic bias can be roughly classified into four categories, distinguished by the conditional probabilistic dependencies to which they are sensitive. First, measures of "procedural bias" diagnose bias when the score returned by an algorithm is probabilistically dependent on a sensitive class variable (e.g. race or sex). Second, measures of "outcome bias" capture probabilistic dependence between class variables and the outcome for each subject (e.g. parole granted or loan denied). Third, measures of "behavior-relative error bias" capture probabilistic dependence between class variables and the algorithmic score, conditional on target behaviors (e.g. recidivism or loan default). Fourth, measures of "score-relative error bias" capture probabilistic dependence between class variables and behavior, conditional on score. Several recent discussions have demonstrated a tradeoff between these different measures of algorithmic bias, and at least one recent paper has suggested conditions under which tradeoffs may be minimized. In this paper we use the machinery of causal graphical models to show that, under standard assumptions, the underlying causal relations among variables forces some tradeoffs. We delineate a number of normative considerations that are encoded in different measures of bias, with reference to the philosophical literature on the wrongfulness of disparate treatment and disparate impact. While both kinds of error bias are nominally motivated by concern to avoid disparate impact, we argue that consideration of causal structures shows that these measures are better understood as complicated and unreliable measures of procedural biases (i.e. disparate treatment). Moreover, while procedural bias is indicative of disparate treatment, we show that the measure of procedural bias one ought to adopt is dependent on the account of the wrongfulness of disparate treatment one endorses. Finally, given that neither score-relative nor behavior-relative measures of error bias capture the relevant normative considerations, we suggest that error bias proper is best measured by score-based measures of accuracy, such as the Brier score.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

衡量重要的偏见:算法公平衡量的道德和偶然基础

算法偏差的度量可以大致分为四类，根据它们敏感的条件概率依赖性来区分。首先，当算法返回的分数在概率上依赖于一个敏感的类别变量(如种族或性别)时，“程序偏差”的测量方法可以诊断出偏差。其次，“结果偏差”的测量方法捕获类别变量与每个受试者的结果之间的概率依赖关系(例如，批准假释或拒绝贷款)。第三，“行为相对误差偏差”的度量捕捉类变量和算法得分之间的概率依赖关系，以目标行为为条件(例如累犯或贷款违约)。第四，“分数相对误差偏差”的测量方法捕捉类别变量和行为之间的概率依赖关系，以分数为条件。最近的一些讨论已经证明了这些不同的算法偏差度量之间的权衡，并且至少有一篇最近的论文提出了可以将权衡最小化的条件。在本文中，我们使用因果图模型的机制来表明，在标准假设下，变量之间的潜在因果关系迫使一些权衡。我们描述了一些规范性的考虑，这些考虑被编码在不同的偏见测量中，并参考了关于差别待遇和差别影响的错误性的哲学文献。虽然这两种误差偏差名义上都是出于避免差异影响的考虑，但我们认为，对因果结构的考虑表明，这些措施最好被理解为程序偏差(即差异处理)的复杂且不可靠的措施。此外，虽然程序性偏见是差别待遇的标志，但我们表明，一个人应该采用的程序性偏见的衡量标准取决于他所赞同的差别待遇的不当性。最后，鉴于误差偏差的分数相对度量和行为相对度量都没有捕捉到相关的规范性考虑，我们建议误差偏差最好通过基于分数的准确性度量来衡量，例如Brier分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Conference on Fairness, Accountability, and Transparency

自引率

0.00%

发文量

期刊最新文献

Algorithmic Transparency from the South: Examining the state of algorithmic transparency in Chile's public administration algorithms FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021 Transparency universal Resisting transparency Conclusion