Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society最新文献_第6页

Measuring Group Advantage: A Comparative Study of Fair Ranking Metrics 衡量群体优势:公平排名指标的比较研究

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-07-21 DOI: 10.1145/3461702.3462588

C. Kuhlman, Walter Gerych, Elke A. Rundensteiner

Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics. We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.

排名评估指标在信息检索中发挥重要作用，提供开发过程中的优化目标和评估部署性能的手段。最近，排名的公平性已经被认为是至关重要的，特别是随着自动化系统越来越多地用于高影响力的决策。虽然提出了许多公平指标，但缺乏对其相互关系的比较分析。即使是衡量群体优势的基本统计平价指标，也不清楚这些指标是否衡量了相同的现象，或者一个指标何时可能产生不同于另一个指标的结果。为了解决这些悬而未决的问题，我们为度量的分析比较制定了一个概念框架。我们证明，在合理的假设下，文献中的流行指标表现出相同的行为，并且优化一个优化所有。然而，我们的分析还表明，衡量的不公平程度各不相同，特别是当一个群体占绝对多数时。基于这一分析，我们设计了一个实用的统计检验，以确定观察到的数据是否可能表现出可预测的群体偏差。我们为从业者提供了一组建议，以指导他们选择适当的公平度量标准。

{"title":"Measuring Group Advantage: A Comparative Study of Fair Ranking Metrics","authors":"C. Kuhlman, Walter Gerych, Elke A. Rundensteiner","doi":"10.1145/3461702.3462588","DOIUrl":"https://doi.org/10.1145/3461702.3462588","url":null,"abstract":"Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics. We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Rawlsian Fair Adaptation of Deep Learning Classifiers 深度学习分类器的罗尔斯公平适应

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-31 DOI: 10.1145/3461702.3462592

Kulin Shah, Pooja Gupta, A. Deshpande, C. Bhattacharyya

Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness.

分类中的群体公平旨在实现预测效用在不同敏感亚群体(如种族或性别)之间的平等。群体公平中的平等或接近平等约束不仅会使总效用恶化，而且会使最弱势亚群体的效用恶化。在本文中，我们将帕累托效率和最小差分原理应用于效用的准确性，作为一个说明性的例子，并得到了罗尔斯分类器，该分类器在最贫困敏感子群体上最小化错误率。我们的数学表征表明，罗尔斯分类器在公平机会均等的精神下，统一地对特征的理想分数应用阈值。在实践中，这样的分数或特征表示通常是由一个有用但不公平的黑盒模型计算的。我们的第二个贡献是在不改变其计算的分数或特征表示的情况下，对任何给定的黑盒深度学习模型进行实际的罗尔斯式公平适应。给定任何分数函数或特征表示及其在敏感子种群上的二阶统计量，我们寻求给定分数的阈值分类器或给定特征表示的线性阈值分类器，以实现限制在该假设类中的罗尔斯错误率。我们的技术贡献是使用模糊的机会约束来表述上述问题，并提供罗尔斯公平适应的有效算法，以及可证明的罗尔斯错误率上界。我们的实证结果表明，即使没有对公平性进行再培训，也比最先进的群体公平算法有了显著的改进。

{"title":"Rawlsian Fair Adaptation of Deep Learning Classifiers","authors":"Kulin Shah, Pooja Gupta, A. Deshpande, C. Bhattacharyya","doi":"10.1145/3461702.3462592","DOIUrl":"https://doi.org/10.1145/3461702.3462592","url":null,"abstract":"Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127840314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Computer Vision and Conflicting Values: Describing People with Automated Alt Text 计算机视觉与价值观冲突:用自动替代文本描述人

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-26 DOI: 10.1145/3461702.3462620

Margot Hanley, Solon Barocas, K. Levy, Shiri Azenkot, H. Nissenbaum

Scholars have recently drawn attention to a range of controversial issues posed by the use of computer vision for automatically generating descriptions of people in images. Despite these concerns, automated image description has become an important tool to ensure equitable access to information for blind and low vision people. In this paper, we investigate the ethical dilemmas faced by companies that have adopted the use of computer vision for producing alt text: textual descriptions of images for blind and low vision people. We use Facebook's automatic alt text tool as our primary case study. First, we analyze the policies that Facebook has adopted with respect to identity categories, such as race, gender, age, etc., and the company's decisions about whether to present these terms in alt text. We then describe an alternative---and manual---approach practiced in the museum community, focusing on how museums determine what to include in alt text descriptions of cultural artifacts. We compare these policies, using notable points of contrast to develop an analytic framework that characterizes the particular apprehensions behind these policy choices. We conclude by considering two strategies that seem to sidestep some of these concerns, finding that there are no easy ways to avoid the normative dilemmas posed by the use of computer vision to automate alt text.

学者们最近注意到使用计算机视觉自动生成图像中人物的描述所带来的一系列有争议的问题。尽管存在这些担忧，自动图像描述已成为确保盲人和低视力人群公平获取信息的重要工具。在本文中，我们研究了采用计算机视觉制作纯文本的公司所面临的伦理困境:为盲人和低视力人群提供图像的文本描述。我们使用Facebook的自动alt文本工具作为我们的主要案例研究。首先，我们分析Facebook在种族、性别、年龄等身份类别方面采取的政策，以及该公司是否在所有文本中呈现这些术语的决定。然后，我们描述了在博物馆社区实践的另一种手动方法，重点关注博物馆如何确定在文化文物的所有文本描述中包含哪些内容。我们比较了这些政策，使用显著的对比点来开发一个分析框架，以表征这些政策选择背后的特定忧虑。最后，我们考虑了两种似乎可以回避这些问题的策略，发现没有简单的方法可以避免使用计算机视觉自动化所有文本所带来的规范性困境。

{"title":"Computer Vision and Conflicting Values: Describing People with Automated Alt Text","authors":"Margot Hanley, Solon Barocas, K. Levy, Shiri Azenkot, H. Nissenbaum","doi":"10.1145/3461702.3462620","DOIUrl":"https://doi.org/10.1145/3461702.3462620","url":null,"abstract":"Scholars have recently drawn attention to a range of controversial issues posed by the use of computer vision for automatically generating descriptions of people in images. Despite these concerns, automated image description has become an important tool to ensure equitable access to information for blind and low vision people. In this paper, we investigate the ethical dilemmas faced by companies that have adopted the use of computer vision for producing alt text: textual descriptions of images for blind and low vision people. We use Facebook's automatic alt text tool as our primary case study. First, we analyze the policies that Facebook has adopted with respect to identity categories, such as race, gender, age, etc., and the company's decisions about whether to present these terms in alt text. We then describe an alternative---and manual---approach practiced in the museum community, focusing on how museums determine what to include in alt text descriptions of cultural artifacts. We compare these policies, using notable points of contrast to develop an analytic framework that characterizes the particular apprehensions behind these policy choices. We conclude by considering two strategies that seem to sidestep some of these concerns, finding that there are no easy ways to avoid the normative dilemmas posed by the use of computer vision to automate alt text.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Algorithmic Audit of Italian Car Insurance: Evidence of Unfairness in Access and Pricing 意大利汽车保险的算法审计:准入和定价不公平的证据

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-21 DOI: 10.1145/3461702.3462569

Alessandro Fabris, Alan Mishler, S. Gottardi, Mattia Carletti, Matteo Daicampi, Gian Antonio Susto, Gianmaria Silvello

We conduct an audit of pricing algorithms employed by companies in the Italian car insurance industry, primarily by gathering quotes through a popular comparison website. While acknowledging the complexity of the industry, we find evidence of several problematic practices. We show that birthplace and gender have a direct and sizeable impact on the prices quoted to drivers, despite national and international regulations against their use. Birthplace, in particular, is used quite frequently to the disadvantage of foreign-born drivers and drivers born in certain Italian cities. In extreme cases, a driver born in Laos may be charged 1,000 more than a driver born in Milan, all else being equal. For a subset of our sample, we collect quotes directly on a company website, where the direct influence of gender and birthplace is confirmed. Finally, we find that drivers with riskier profiles tend to see fewer quotes in the aggregator result pages, substantiating concerns of differential treatment raised in the past by Italian insurance regulators.

我们对意大利汽车保险行业的公司采用的定价算法进行审计，主要是通过一个受欢迎的比较网站收集报价。在承认行业复杂性的同时，我们发现了一些有问题的做法的证据。我们的研究表明，出生地和性别对司机的报价有着直接而巨大的影响，尽管国家和国际法规禁止使用出生地和性别。尤其是出生地这个词，经常被用来对外国出生的司机和某些意大利城市出生的司机不利。在极端情况下，在其他条件相同的情况下，老挝出生的司机可能会比米兰出生的司机多收1000美元。对于我们样本的一个子集，我们直接在公司网站上收集报价，其中性别和出生地的直接影响得到了证实。最后，我们发现，风险较高的司机往往在聚合结果页面上看到更少的报价，这证实了意大利保险监管机构过去提出的差别待遇的担忧。

{"title":"Algorithmic Audit of Italian Car Insurance: Evidence of Unfairness in Access and Pricing","authors":"Alessandro Fabris, Alan Mishler, S. Gottardi, Mattia Carletti, Matteo Daicampi, Gian Antonio Susto, Gianmaria Silvello","doi":"10.1145/3461702.3462569","DOIUrl":"https://doi.org/10.1145/3461702.3462569","url":null,"abstract":"We conduct an audit of pricing algorithms employed by companies in the Italian car insurance industry, primarily by gathering quotes through a popular comparison website. While acknowledging the complexity of the industry, we find evidence of several problematic practices. We show that birthplace and gender have a direct and sizeable impact on the prices quoted to drivers, despite national and international regulations against their use. Birthplace, in particular, is used quite frequently to the disadvantage of foreign-born drivers and drivers born in certain Italian cities. In extreme cases, a driver born in Laos may be charged 1,000 more than a driver born in Milan, all else being equal. For a subset of our sample, we collect quotes directly on a company website, where the direct influence of gender and birthplace is confirmed. Finally, we find that drivers with riskier profiles tend to see fewer quotes in the aggregator result pages, substantiating concerns of differential treatment raised in the past by Italian insurance regulators.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129459390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Risk Identification Questionnaire for Detecting Unintended Bias in the Machine Learning Development Lifecycle 用于检测机器学习开发生命周期中意外偏差的风险识别问卷

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-21 DOI: 10.1145/3461702.3462572

M. S. Lee, Jatinder Singh

Unintended biases in machine learning (ML) models have the potential to introduce undue discrimination and exacerbate social inequalities. The research community has proposed various technical and qualitative methods intended to assist practitioners in assessing these biases. While frameworks for identifying the risks of harm due to unintended biases have been proposed, they have not yet been operationalised into practical tools to assist industry practitioners. In this paper, we link prior work on bias assessment methods to phases of a standard organisational risk management process (RMP), noting a gap in measures for helping practitioners identify bias- related risks. Targeting this gap, we introduce a bias identification methodology and questionnaire, illustrating its application through a real-world, practitioner-led use case. We validate the need and usefulness of the questionnaire through a survey of industry practitioners, which provides insights into their practical requirements and preferences. Our results indicate that such a questionnaire is helpful for proactively uncovering unexpected bias concerns, particularly where it is easy to integrate into existing processes, and facilitates communication with non-technical stakeholders. Ultimately, the effective end-to-end management of ML risks requires a more targeted identification of potential harm and its sources, so that appropriate mitigation strategies can be formulated. Towards this, our questionnaire provides a practical means to assist practitioners in identifying bias-related risks.

机器学习(ML)模型中的意外偏差有可能引入不适当的歧视并加剧社会不平等。研究界提出了各种技术和定性方法，旨在帮助从业者评估这些偏差。虽然已经提出了识别无意偏见造成伤害风险的框架，但尚未将其付诸实施，成为协助行业从业者的实用工具。在本文中，我们将偏见评估方法的先前工作与标准组织风险管理过程(RMP)的各个阶段联系起来，注意到帮助从业者识别偏见相关风险的措施存在差距。针对这一差距，我们引入了一种偏见识别方法和问卷，并通过一个现实世界的、由从业者主导的用例来说明其应用。我们通过对行业从业者的调查来验证问卷的必要性和有效性，从而深入了解他们的实际需求和偏好。我们的结果表明，这样的调查问卷有助于主动发现意想不到的偏见问题，特别是在它很容易集成到现有流程的情况下，并促进与非技术利益相关者的沟通。最终，要对机器学习风险进行有效的端到端管理，就需要更有针对性地确定潜在危害及其来源，以便制定适当的缓解战略。为此，我们的调查问卷提供了一种实用的方法来帮助从业者识别与偏见相关的风险。

{"title":"Risk Identification Questionnaire for Detecting Unintended Bias in the Machine Learning Development Lifecycle","authors":"M. S. Lee, Jatinder Singh","doi":"10.1145/3461702.3462572","DOIUrl":"https://doi.org/10.1145/3461702.3462572","url":null,"abstract":"Unintended biases in machine learning (ML) models have the potential to introduce undue discrimination and exacerbate social inequalities. The research community has proposed various technical and qualitative methods intended to assist practitioners in assessing these biases. While frameworks for identifying the risks of harm due to unintended biases have been proposed, they have not yet been operationalised into practical tools to assist industry practitioners. In this paper, we link prior work on bias assessment methods to phases of a standard organisational risk management process (RMP), noting a gap in measures for helping practitioners identify bias- related risks. Targeting this gap, we introduce a bias identification methodology and questionnaire, illustrating its application through a real-world, practitioner-led use case. We validate the need and usefulness of the questionnaire through a survey of industry practitioners, which provides insights into their practical requirements and preferences. Our results indicate that such a questionnaire is helpful for proactively uncovering unexpected bias concerns, particularly where it is easy to integrate into existing processes, and facilitates communication with non-technical stakeholders. Ultimately, the effective end-to-end management of ML risks requires a more targeted identification of potential harm and its sources, so that appropriate mitigation strategies can be formulated. Towards this, our questionnaire provides a practical means to assist practitioners in identifying bias-related risks.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114291076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Measuring Lay Reactions to Personal Data Markets 衡量普通人对个人数据市场的反应

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-21 DOI: 10.1145/3461702.3462582

Aileen Nielsen

The recording, aggregation, and exchange of personal data is necessary to the development of socially-relevant machine learning applications. However, anecdotal and survey evidence show that ordinary people feel discontent and even anger regarding data collection practices that are currently typical and legal. This suggests that personal data markets in their current form do not adhere to the norms applied by ordinary people. The present study experimentally probes whether market transactions in a typical online scenario are accepted when evaluated by lay people. The results show that a high percentage of study participants refused to participate in a data pricing exercise, even in a commercial context where market rules would typically be expected to apply. For those participants who did price the data, the median price was an order of magnitude higher than the market price. These results call into question the notice and consent market paradigm that is used by technology firms and government regulators when evaluating data flows. The results also point to a conceptual mismatch between cultural and legal expectations regarding the use of personal data.

个人数据的记录、聚合和交换对于开发与社会相关的机器学习应用程序是必要的。然而，轶事和调查证据表明，普通人对目前典型和合法的数据收集做法感到不满甚至愤怒。这表明，目前形式的个人数据市场并不遵守普通人适用的规范。本研究通过实验探讨了一个典型的网络场景中的市场交易在被外行人评估时是否被接受。结果表明，很高比例的研究参与者拒绝参与数据定价活动，即使是在市场规则通常适用的商业环境中。对于那些为数据定价的参与者来说，中位数价格比市场价格高出一个数量级。这些结果对技术公司和政府监管机构在评估数据流时使用的通知和同意市场范式提出了质疑。调查结果还指出，在个人数据使用方面，文化和法律期望之间存在概念上的不匹配。

引用次数: 1

Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective 噪声协变量下模型公平性测量的理论视角

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-20 DOI: 10.1145/3461702.3462603

Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel

In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.

在这项工作中，我们研究了在噪声信息下测量机器学习模型公平性的问题。关注群体公平指标，我们研究了评估需要控制协变量混杂效应的特殊但常见的情况。在实际设置中，我们可能无法联合观察协变量和组信息，然后标准的解决方法是为这些变量中的一个或多个使用代理。先前的工作已经证明了使用敏感属性代理的挑战，并且需要强大的独立性假设来保证噪声估计的准确性。相比之下，在这项工作中，我们使用协变量的代理进行研究，并提出了一个理论分析，旨在描述可能进行准确公平评估的较弱条件。此外，我们的理论确定了潜在的错误来源，并将它们解耦为两个可解释的部分y和E。第一部分y仅取决于代理的性能，如精度和召回率，而第二部分E捕获所有感兴趣的变量之间的相关性。我们表明，在许多情况下，估计误差通过线性依赖由y主导，而对相关性E的依赖仅构成低阶项。因此，我们扩展了对通过代理测量模型公平性可能是一种有效方法的场景的理解。最后，我们通过模拟比较了理论上界与模拟估计误差的分布，并表明在数据上假设一些结构，即使是弱结构，是显著提高理论保证和经验结果的关键。

{"title":"Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective","authors":"Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel","doi":"10.1145/3461702.3462603","DOIUrl":"https://doi.org/10.1145/3461702.3462603","url":null,"abstract":"In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130593533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Are AI Ethics Conferences Different and More Diverse Compared to Traditional Computer Science Conferences? 与传统的计算机科学会议相比，人工智能伦理会议不同吗?

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-19 DOI: 10.1145/3461702.3462616

Daniel Ernesto Acuna, Lizhen Liang

Even though computer science (CS) has had a historical lack of gender and race representation, its AI research affects everybody eventually. Being partially rooted in CS conferences, "AI ethics" (AIE) conferences such as FAccT and AIES have quickly become distinct venues where AI's societal implications are discussed and solutions proposed. However, it is largely unknown if these conferences improve upon the historical representational issues of traditional CS venues. In this work, we explore AIE conferences' evolution and compare them across demographic characteristics, publication content, and citation patterns. We find that AIE conferences have increased their internal topical diversity and impact on other CS conferences. Importantly, AIE conferences are highly differentiable, covering topics not represented in other venues. However, and perhaps contrary to the field's aspirations, white authors are more common while seniority and black researchers are represented similarly to CS venues. Our results suggest that AIE conferences could increase efforts to attract more diverse authors, especially considering their sizable roots in CS.

尽管计算机科学(CS)在历史上一直缺乏性别和种族代表性，但其人工智能研究最终会影响到每个人。FAccT和AIES等“人工智能伦理”(AIE)会议部分植根于CS会议，已迅速成为讨论人工智能社会影响并提出解决方案的独特场所。然而，这些会议是否改善了传统CS场地的历史代表性问题，这在很大程度上是未知的。在这项工作中，我们探讨了AIE会议的演变，并比较了它们在人口统计学特征、出版内容和引用模式方面的差异。我们发现AIE会议增加了其内部主题多样性和对其他CS会议的影响。重要的是，AIE会议是高度可区分的，涵盖了在其他场所没有代表的主题。然而，也许与该领域的期望相反，白人作者更常见，而资历和黑人研究人员的代表与CS场所相似。我们的研究结果表明，AIE会议可以加大努力吸引更多不同的作者，特别是考虑到他们在CS领域的巨大根基。

{"title":"Are AI Ethics Conferences Different and More Diverse Compared to Traditional Computer Science Conferences?","authors":"Daniel Ernesto Acuna, Lizhen Liang","doi":"10.1145/3461702.3462616","DOIUrl":"https://doi.org/10.1145/3461702.3462616","url":null,"abstract":"Even though computer science (CS) has had a historical lack of gender and race representation, its AI research affects everybody eventually. Being partially rooted in CS conferences, \"AI ethics\" (AIE) conferences such as FAccT and AIES have quickly become distinct venues where AI's societal implications are discussed and solutions proposed. However, it is largely unknown if these conferences improve upon the historical representational issues of traditional CS venues. In this work, we explore AIE conferences' evolution and compare them across demographic characteristics, publication content, and citation patterns. We find that AIE conferences have increased their internal topical diversity and impact on other CS conferences. Importantly, AIE conferences are highly differentiable, covering topics not represented in other venues. However, and perhaps contrary to the field's aspirations, white authors are more common while seniority and black researchers are represented similarly to CS venues. Our results suggest that AIE conferences could increase efforts to attract more diverse authors, especially considering their sizable roots in CS.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114534159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-19 DOI: 10.1145/3461702.3462576

Siddharth Mehrotra, C. Jonker, M. Tielman

As AI systems are increasingly involved in decision making, it also becomes important that they elicit appropriate levels of trust from their users. To achieve this, it is first important to understand which factors influence trust in AI. We identify that a research gap exists regarding the role of personal values in trust in AI. Therefore, this paper studies how human and agent Value Similarity (VS) influences a human's trust in that agent. To explore this, 89 participants teamed up with five different agents, which were designed with varying levels of value similarity to that of the participants. In a within-subjects, scenario-based experiment, agents gave suggestions on what to do when entering the building to save a hostage. We analyzed the agent's scores on subjective value similarity, trust and qualitative data from open-ended questions. Our results show that agents rated as having more similar values also scored higher on trust, indicating a positive effect between the two. With this result, we add to the existing understanding of human-agent trust by providing insight into the role of value-similarity.

随着人工智能系统越来越多地参与决策，它们从用户那里获得适当程度的信任也变得很重要。要做到这一点，首先要了解哪些因素会影响对人工智能的信任。我们发现，关于个人价值观在人工智能信任中的作用，存在研究差距。因此，本文研究人类与智能体的价值相似度如何影响人类对该智能体的信任。为了探索这个问题，89名参与者与5个不同的代理合作，这些代理被设计成与参与者的价值相似程度不同。在一项基于场景的实验中，特工们就进入大楼营救人质时该怎么做给出了建议。我们分析了代理在主观价值相似性、信任和来自开放式问题的定性数据方面的得分。我们的研究结果表明，被评为具有更多相似值的代理人在信任方面的得分也更高，这表明两者之间存在积极影响。有了这个结果，我们通过洞察价值相似性的作用，增加了对人类代理信任的现有理解。

{"title":"More Similar Values, More Trust? - the Effect of Value Similarity on Trust in Human-Agent Interaction","authors":"Siddharth Mehrotra, C. Jonker, M. Tielman","doi":"10.1145/3461702.3462576","DOIUrl":"https://doi.org/10.1145/3461702.3462576","url":null,"abstract":"As AI systems are increasingly involved in decision making, it also becomes important that they elicit appropriate levels of trust from their users. To achieve this, it is first important to understand which factors influence trust in AI. We identify that a research gap exists regarding the role of personal values in trust in AI. Therefore, this paper studies how human and agent Value Similarity (VS) influences a human's trust in that agent. To explore this, 89 participants teamed up with five different agents, which were designed with varying levels of value similarity to that of the participants. In a within-subjects, scenario-based experiment, agents gave suggestions on what to do when entering the building to save a hostage. We analyzed the agent's scores on subjective value similarity, trust and qualitative data from open-ended questions. Our results show that agents rated as having more similar values also scored higher on trust, indicating a positive effect between the two. With this result, we add to the existing understanding of human-agent trust by providing insight into the role of value-similarity.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132303925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

AI and Shared Prosperity 人工智能与共享繁荣

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pub Date : 2021-05-18 DOI: 10.1145/3461702.3462619

Katya Klinova, Anton Korinek

Future advances in AI that automate away human labor may have stark implications for labor markets and inequality. This paper proposes a framework to analyze the effects of specific types of AI systems on the labor market, based on how much labor demand they will create versus displace, while taking into account that productivity gains also make society wealthier and thereby contribute to additional labor demand. This analysis enables ethically-minded companies creating or deploying AI systems as well as researchers and policymakers to take into account the effects of their actions on labor markets and inequality, and therefore to steer progress in AI in a direction that advances shared prosperity and an inclusive economic future for all of humanity.

人工智能的未来发展将使人类劳动自动化，这可能会对劳动力市场和不平等产生严重影响。本文提出了一个框架来分析特定类型的人工智能系统对劳动力市场的影响，该框架基于它们将创造多少劳动力需求而不是取代多少劳动力需求，同时考虑到生产率的提高也会使社会更富裕，从而有助于增加劳动力需求。这种分析使具有道德意识的公司能够创造或部署人工智能系统，以及研究人员和政策制定者能够考虑到他们的行为对劳动力市场和不平等的影响，从而引导人工智能朝着促进共享繁荣和全人类包容性经济未来的方向发展。

引用次数: 16