噪声协变量下模型公平性测量的理论视角

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society Pub Date : 2021-05-20 DOI:10.1145/3461702.3462603

Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel

{"title":"噪声协变量下模型公平性测量的理论视角","authors":"Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel","doi":"10.1145/3461702.3462603","DOIUrl":null,"url":null,"abstract":"In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective\",\"authors\":\"Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel\",\"doi\":\"10.1145/3461702.3462603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.\",\"PeriodicalId\":197336,\"journal\":{\"name\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461702.3462603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在这项工作中，我们研究了在噪声信息下测量机器学习模型公平性的问题。关注群体公平指标，我们研究了评估需要控制协变量混杂效应的特殊但常见的情况。在实际设置中，我们可能无法联合观察协变量和组信息，然后标准的解决方法是为这些变量中的一个或多个使用代理。先前的工作已经证明了使用敏感属性代理的挑战，并且需要强大的独立性假设来保证噪声估计的准确性。相比之下，在这项工作中，我们使用协变量的代理进行研究，并提出了一个理论分析，旨在描述可能进行准确公平评估的较弱条件。此外，我们的理论确定了潜在的错误来源，并将它们解耦为两个可解释的部分y和E。第一部分y仅取决于代理的性能，如精度和召回率，而第二部分E捕获所有感兴趣的变量之间的相关性。我们表明，在许多情况下，估计误差通过线性依赖由y主导，而对相关性E的依赖仅构成低阶项。因此，我们扩展了对通过代理测量模型公平性可能是一种有效方法的场景的理解。最后，我们通过模拟比较了理论上界与模拟估计误差的分布，并表明在数据上假设一些结构，即使是弱结构，是显著提高理论保证和经验结果的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective

In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

自引率

0.00%

发文量