Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics. We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.
{"title":"Measuring Group Advantage: A Comparative Study of Fair Ranking Metrics","authors":"C. Kuhlman, Walter Gerych, Elke A. Rundensteiner","doi":"10.1145/3461702.3462588","DOIUrl":"https://doi.org/10.1145/3461702.3462588","url":null,"abstract":"Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics. We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kulin Shah, Pooja Gupta, A. Deshpande, C. Bhattacharyya
Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness.
{"title":"Rawlsian Fair Adaptation of Deep Learning Classifiers","authors":"Kulin Shah, Pooja Gupta, A. Deshpande, C. Bhattacharyya","doi":"10.1145/3461702.3462592","DOIUrl":"https://doi.org/10.1145/3461702.3462592","url":null,"abstract":"Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127840314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Margot Hanley, Solon Barocas, K. Levy, Shiri Azenkot, H. Nissenbaum
Scholars have recently drawn attention to a range of controversial issues posed by the use of computer vision for automatically generating descriptions of people in images. Despite these concerns, automated image description has become an important tool to ensure equitable access to information for blind and low vision people. In this paper, we investigate the ethical dilemmas faced by companies that have adopted the use of computer vision for producing alt text: textual descriptions of images for blind and low vision people. We use Facebook's automatic alt text tool as our primary case study. First, we analyze the policies that Facebook has adopted with respect to identity categories, such as race, gender, age, etc., and the company's decisions about whether to present these terms in alt text. We then describe an alternative---and manual---approach practiced in the museum community, focusing on how museums determine what to include in alt text descriptions of cultural artifacts. We compare these policies, using notable points of contrast to develop an analytic framework that characterizes the particular apprehensions behind these policy choices. We conclude by considering two strategies that seem to sidestep some of these concerns, finding that there are no easy ways to avoid the normative dilemmas posed by the use of computer vision to automate alt text.
{"title":"Computer Vision and Conflicting Values: Describing People with Automated Alt Text","authors":"Margot Hanley, Solon Barocas, K. Levy, Shiri Azenkot, H. Nissenbaum","doi":"10.1145/3461702.3462620","DOIUrl":"https://doi.org/10.1145/3461702.3462620","url":null,"abstract":"Scholars have recently drawn attention to a range of controversial issues posed by the use of computer vision for automatically generating descriptions of people in images. Despite these concerns, automated image description has become an important tool to ensure equitable access to information for blind and low vision people. In this paper, we investigate the ethical dilemmas faced by companies that have adopted the use of computer vision for producing alt text: textual descriptions of images for blind and low vision people. We use Facebook's automatic alt text tool as our primary case study. First, we analyze the policies that Facebook has adopted with respect to identity categories, such as race, gender, age, etc., and the company's decisions about whether to present these terms in alt text. We then describe an alternative---and manual---approach practiced in the museum community, focusing on how museums determine what to include in alt text descriptions of cultural artifacts. We compare these policies, using notable points of contrast to develop an analytic framework that characterizes the particular apprehensions behind these policy choices. We conclude by considering two strategies that seem to sidestep some of these concerns, finding that there are no easy ways to avoid the normative dilemmas posed by the use of computer vision to automate alt text.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Fabris, Alan Mishler, S. Gottardi, Mattia Carletti, Matteo Daicampi, Gian Antonio Susto, Gianmaria Silvello
We conduct an audit of pricing algorithms employed by companies in the Italian car insurance industry, primarily by gathering quotes through a popular comparison website. While acknowledging the complexity of the industry, we find evidence of several problematic practices. We show that birthplace and gender have a direct and sizeable impact on the prices quoted to drivers, despite national and international regulations against their use. Birthplace, in particular, is used quite frequently to the disadvantage of foreign-born drivers and drivers born in certain Italian cities. In extreme cases, a driver born in Laos may be charged 1,000 more than a driver born in Milan, all else being equal. For a subset of our sample, we collect quotes directly on a company website, where the direct influence of gender and birthplace is confirmed. Finally, we find that drivers with riskier profiles tend to see fewer quotes in the aggregator result pages, substantiating concerns of differential treatment raised in the past by Italian insurance regulators.
{"title":"Algorithmic Audit of Italian Car Insurance: Evidence of Unfairness in Access and Pricing","authors":"Alessandro Fabris, Alan Mishler, S. Gottardi, Mattia Carletti, Matteo Daicampi, Gian Antonio Susto, Gianmaria Silvello","doi":"10.1145/3461702.3462569","DOIUrl":"https://doi.org/10.1145/3461702.3462569","url":null,"abstract":"We conduct an audit of pricing algorithms employed by companies in the Italian car insurance industry, primarily by gathering quotes through a popular comparison website. While acknowledging the complexity of the industry, we find evidence of several problematic practices. We show that birthplace and gender have a direct and sizeable impact on the prices quoted to drivers, despite national and international regulations against their use. Birthplace, in particular, is used quite frequently to the disadvantage of foreign-born drivers and drivers born in certain Italian cities. In extreme cases, a driver born in Laos may be charged 1,000 more than a driver born in Milan, all else being equal. For a subset of our sample, we collect quotes directly on a company website, where the direct influence of gender and birthplace is confirmed. Finally, we find that drivers with riskier profiles tend to see fewer quotes in the aggregator result pages, substantiating concerns of differential treatment raised in the past by Italian insurance regulators.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129459390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unintended biases in machine learning (ML) models have the potential to introduce undue discrimination and exacerbate social inequalities. The research community has proposed various technical and qualitative methods intended to assist practitioners in assessing these biases. While frameworks for identifying the risks of harm due to unintended biases have been proposed, they have not yet been operationalised into practical tools to assist industry practitioners. In this paper, we link prior work on bias assessment methods to phases of a standard organisational risk management process (RMP), noting a gap in measures for helping practitioners identify bias- related risks. Targeting this gap, we introduce a bias identification methodology and questionnaire, illustrating its application through a real-world, practitioner-led use case. We validate the need and usefulness of the questionnaire through a survey of industry practitioners, which provides insights into their practical requirements and preferences. Our results indicate that such a questionnaire is helpful for proactively uncovering unexpected bias concerns, particularly where it is easy to integrate into existing processes, and facilitates communication with non-technical stakeholders. Ultimately, the effective end-to-end management of ML risks requires a more targeted identification of potential harm and its sources, so that appropriate mitigation strategies can be formulated. Towards this, our questionnaire provides a practical means to assist practitioners in identifying bias-related risks.
{"title":"Risk Identification Questionnaire for Detecting Unintended Bias in the Machine Learning Development Lifecycle","authors":"M. S. Lee, Jatinder Singh","doi":"10.1145/3461702.3462572","DOIUrl":"https://doi.org/10.1145/3461702.3462572","url":null,"abstract":"Unintended biases in machine learning (ML) models have the potential to introduce undue discrimination and exacerbate social inequalities. The research community has proposed various technical and qualitative methods intended to assist practitioners in assessing these biases. While frameworks for identifying the risks of harm due to unintended biases have been proposed, they have not yet been operationalised into practical tools to assist industry practitioners. In this paper, we link prior work on bias assessment methods to phases of a standard organisational risk management process (RMP), noting a gap in measures for helping practitioners identify bias- related risks. Targeting this gap, we introduce a bias identification methodology and questionnaire, illustrating its application through a real-world, practitioner-led use case. We validate the need and usefulness of the questionnaire through a survey of industry practitioners, which provides insights into their practical requirements and preferences. Our results indicate that such a questionnaire is helpful for proactively uncovering unexpected bias concerns, particularly where it is easy to integrate into existing processes, and facilitates communication with non-technical stakeholders. Ultimately, the effective end-to-end management of ML risks requires a more targeted identification of potential harm and its sources, so that appropriate mitigation strategies can be formulated. Towards this, our questionnaire provides a practical means to assist practitioners in identifying bias-related risks.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114291076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recording, aggregation, and exchange of personal data is necessary to the development of socially-relevant machine learning applications. However, anecdotal and survey evidence show that ordinary people feel discontent and even anger regarding data collection practices that are currently typical and legal. This suggests that personal data markets in their current form do not adhere to the norms applied by ordinary people. The present study experimentally probes whether market transactions in a typical online scenario are accepted when evaluated by lay people. The results show that a high percentage of study participants refused to participate in a data pricing exercise, even in a commercial context where market rules would typically be expected to apply. For those participants who did price the data, the median price was an order of magnitude higher than the market price. These results call into question the notice and consent market paradigm that is used by technology firms and government regulators when evaluating data flows. The results also point to a conceptual mismatch between cultural and legal expectations regarding the use of personal data.
{"title":"Measuring Lay Reactions to Personal Data Markets","authors":"Aileen Nielsen","doi":"10.1145/3461702.3462582","DOIUrl":"https://doi.org/10.1145/3461702.3462582","url":null,"abstract":"The recording, aggregation, and exchange of personal data is necessary to the development of socially-relevant machine learning applications. However, anecdotal and survey evidence show that ordinary people feel discontent and even anger regarding data collection practices that are currently typical and legal. This suggests that personal data markets in their current form do not adhere to the norms applied by ordinary people. The present study experimentally probes whether market transactions in a typical online scenario are accepted when evaluated by lay people. The results show that a high percentage of study participants refused to participate in a data pricing exercise, even in a commercial context where market rules would typically be expected to apply. For those participants who did price the data, the median price was an order of magnitude higher than the market price. These results call into question the notice and consent market paradigm that is used by technology firms and government regulators when evaluating data flows. The results also point to a conceptual mismatch between cultural and legal expectations regarding the use of personal data.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129551722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel
In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.
{"title":"Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective","authors":"Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel","doi":"10.1145/3461702.3462603","DOIUrl":"https://doi.org/10.1145/3461702.3462603","url":null,"abstract":"In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130593533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Even though computer science (CS) has had a historical lack of gender and race representation, its AI research affects everybody eventually. Being partially rooted in CS conferences, "AI ethics" (AIE) conferences such as FAccT and AIES have quickly become distinct venues where AI's societal implications are discussed and solutions proposed. However, it is largely unknown if these conferences improve upon the historical representational issues of traditional CS venues. In this work, we explore AIE conferences' evolution and compare them across demographic characteristics, publication content, and citation patterns. We find that AIE conferences have increased their internal topical diversity and impact on other CS conferences. Importantly, AIE conferences are highly differentiable, covering topics not represented in other venues. However, and perhaps contrary to the field's aspirations, white authors are more common while seniority and black researchers are represented similarly to CS venues. Our results suggest that AIE conferences could increase efforts to attract more diverse authors, especially considering their sizable roots in CS.
{"title":"Are AI Ethics Conferences Different and More Diverse Compared to Traditional Computer Science Conferences?","authors":"Daniel Ernesto Acuna, Lizhen Liang","doi":"10.1145/3461702.3462616","DOIUrl":"https://doi.org/10.1145/3461702.3462616","url":null,"abstract":"Even though computer science (CS) has had a historical lack of gender and race representation, its AI research affects everybody eventually. Being partially rooted in CS conferences, \"AI ethics\" (AIE) conferences such as FAccT and AIES have quickly become distinct venues where AI's societal implications are discussed and solutions proposed. However, it is largely unknown if these conferences improve upon the historical representational issues of traditional CS venues. In this work, we explore AIE conferences' evolution and compare them across demographic characteristics, publication content, and citation patterns. We find that AIE conferences have increased their internal topical diversity and impact on other CS conferences. Importantly, AIE conferences are highly differentiable, covering topics not represented in other venues. However, and perhaps contrary to the field's aspirations, white authors are more common while seniority and black researchers are represented similarly to CS venues. Our results suggest that AIE conferences could increase efforts to attract more diverse authors, especially considering their sizable roots in CS.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114534159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As AI systems are increasingly involved in decision making, it also becomes important that they elicit appropriate levels of trust from their users. To achieve this, it is first important to understand which factors influence trust in AI. We identify that a research gap exists regarding the role of personal values in trust in AI. Therefore, this paper studies how human and agent Value Similarity (VS) influences a human's trust in that agent. To explore this, 89 participants teamed up with five different agents, which were designed with varying levels of value similarity to that of the participants. In a within-subjects, scenario-based experiment, agents gave suggestions on what to do when entering the building to save a hostage. We analyzed the agent's scores on subjective value similarity, trust and qualitative data from open-ended questions. Our results show that agents rated as having more similar values also scored higher on trust, indicating a positive effect between the two. With this result, we add to the existing understanding of human-agent trust by providing insight into the role of value-similarity.
{"title":"More Similar Values, More Trust? - the Effect of Value Similarity on Trust in Human-Agent Interaction","authors":"Siddharth Mehrotra, C. Jonker, M. Tielman","doi":"10.1145/3461702.3462576","DOIUrl":"https://doi.org/10.1145/3461702.3462576","url":null,"abstract":"As AI systems are increasingly involved in decision making, it also becomes important that they elicit appropriate levels of trust from their users. To achieve this, it is first important to understand which factors influence trust in AI. We identify that a research gap exists regarding the role of personal values in trust in AI. Therefore, this paper studies how human and agent Value Similarity (VS) influences a human's trust in that agent. To explore this, 89 participants teamed up with five different agents, which were designed with varying levels of value similarity to that of the participants. In a within-subjects, scenario-based experiment, agents gave suggestions on what to do when entering the building to save a hostage. We analyzed the agent's scores on subjective value similarity, trust and qualitative data from open-ended questions. Our results show that agents rated as having more similar values also scored higher on trust, indicating a positive effect between the two. With this result, we add to the existing understanding of human-agent trust by providing insight into the role of value-similarity.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132303925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Future advances in AI that automate away human labor may have stark implications for labor markets and inequality. This paper proposes a framework to analyze the effects of specific types of AI systems on the labor market, based on how much labor demand they will create versus displace, while taking into account that productivity gains also make society wealthier and thereby contribute to additional labor demand. This analysis enables ethically-minded companies creating or deploying AI systems as well as researchers and policymakers to take into account the effects of their actions on labor markets and inequality, and therefore to steer progress in AI in a direction that advances shared prosperity and an inclusive economic future for all of humanity.
{"title":"AI and Shared Prosperity","authors":"Katya Klinova, Anton Korinek","doi":"10.1145/3461702.3462619","DOIUrl":"https://doi.org/10.1145/3461702.3462619","url":null,"abstract":"Future advances in AI that automate away human labor may have stark implications for labor markets and inequality. This paper proposes a framework to analyze the effects of specific types of AI systems on the labor market, based on how much labor demand they will create versus displace, while taking into account that productivity gains also make society wealthier and thereby contribute to additional labor demand. This analysis enables ethically-minded companies creating or deploying AI systems as well as researchers and policymakers to take into account the effects of their actions on labor markets and inequality, and therefore to steer progress in AI in a direction that advances shared prosperity and an inclusive economic future for all of humanity.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116999272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}