Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments

Ben Green, Yiling Chen
{"title":"Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments","authors":"Ben Green, Yiling Chen","doi":"10.1145/3287560.3287563","DOIUrl":null,"url":null,"abstract":"Despite vigorous debates about the technical characteristics of risk assessments being deployed in the U.S. criminal justice system, remarkably little research has studied how these tools affect actual decision-making processes. After all, risk assessments do not make definitive decisions---they inform judges, who are the final arbiters. It is therefore essential that considerations of risk assessments be informed by rigorous studies of how judges actually interpret and use them. This paper takes a first step toward such research on human interactions with risk assessments through a controlled experimental study on Amazon Mechanical Turk. We found several behaviors that call into question the supposed efficacy and fairness of risk assessments: our study participants 1) underperformed the risk assessment even when presented with its predictions, 2) could not effectively evaluate the accuracy of their own or the risk assessment's predictions, and 3) exhibited behaviors fraught with \"disparate interactions,\" whereby the use of risk assessments led to higher risk predictions about black defendants and lower risk predictions about white defendants. These results suggest the need for a new \"algorithm-in-the-loop\" framework that places machine learning decision-making aids into the sociotechnical context of improving human decisions rather than the technical context of generating the best prediction in the abstract. If risk assessments are to be used at all, they must be grounded in rigorous evaluations of their real-world impacts instead of in their theoretical potential.","PeriodicalId":20573,"journal":{"name":"Proceedings of the Conference on Fairness, Accountability, and Transparency","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"200","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Fairness, Accountability, and Transparency","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3287560.3287563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 200

Abstract

Despite vigorous debates about the technical characteristics of risk assessments being deployed in the U.S. criminal justice system, remarkably little research has studied how these tools affect actual decision-making processes. After all, risk assessments do not make definitive decisions---they inform judges, who are the final arbiters. It is therefore essential that considerations of risk assessments be informed by rigorous studies of how judges actually interpret and use them. This paper takes a first step toward such research on human interactions with risk assessments through a controlled experimental study on Amazon Mechanical Turk. We found several behaviors that call into question the supposed efficacy and fairness of risk assessments: our study participants 1) underperformed the risk assessment even when presented with its predictions, 2) could not effectively evaluate the accuracy of their own or the risk assessment's predictions, and 3) exhibited behaviors fraught with "disparate interactions," whereby the use of risk assessments led to higher risk predictions about black defendants and lower risk predictions about white defendants. These results suggest the need for a new "algorithm-in-the-loop" framework that places machine learning decision-making aids into the sociotechnical context of improving human decisions rather than the technical context of generating the best prediction in the abstract. If risk assessments are to be used at all, they must be grounded in rigorous evaluations of their real-world impacts instead of in their theoretical potential.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不同的相互作用:风险评估公平性的循环算法分析
尽管对美国刑事司法系统中部署的风险评估的技术特征进行了激烈的辩论,但关于这些工具如何影响实际决策过程的研究却非常少。毕竟,风险评估并不能做出决定性的决定——它们只是告知作为最终仲裁者的法官。因此,必须通过对法官如何实际解释和使用风险评估的严格研究,为风险评估的考虑提供信息。本文通过对Amazon Mechanical Turk的对照实验研究,迈出了人类互动与风险评估研究的第一步。我们发现了一些行为,这些行为对风险评估的有效性和公平性提出了质疑:我们的研究参与者1)即使提供了风险评估的预测,他们的表现也不佳;2)不能有效地评估他们自己或风险评估预测的准确性;3)表现出充满“不同的相互作用”的行为,即使用风险评估导致对黑人被告的风险预测较高,对白人被告的风险预测较低。这些结果表明,需要一个新的“循环算法”框架,将机器学习决策辅助工具置于改善人类决策的社会技术背景中,而不是在抽象中生成最佳预测的技术背景中。如果要使用风险评估,它们必须基于对其现实影响的严格评估,而不是基于其理论潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Algorithmic Transparency from the South: Examining the state of algorithmic transparency in Chile's public administration algorithms FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021 Transparency universal Resisting transparency Conclusion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1