According to a prominent approach to AI alignment, AI agents should be built to learn and promote human values. However, humans value things in several different ways: we have desires and preferences of various kinds, and if we engage in reinforcement learning, we also have reward functions. One research project to which this approach gives rise is therefore to say which of these various classes of human values should be promoted. This paper takes on part of this project by assessing the proposal that human reward functions should be the target for AI alignment. There is some reason to believe that powerful AI agents which were aligned to values of this form would help us to lead good lives, but there is also considerable uncertainty about this claim, arising from unresolved empirical and conceptual issues in human psychology.
{"title":"AI Alignment and Human Reward","authors":"Patrick Butlin","doi":"10.1145/3461702.3462570","DOIUrl":"https://doi.org/10.1145/3461702.3462570","url":null,"abstract":"According to a prominent approach to AI alignment, AI agents should be built to learn and promote human values. However, humans value things in several different ways: we have desires and preferences of various kinds, and if we engage in reinforcement learning, we also have reward functions. One research project to which this approach gives rise is therefore to say which of these various classes of human values should be promoted. This paper takes on part of this project by assessing the proposal that human reward functions should be the target for AI alignment. There is some reason to believe that powerful AI agents which were aligned to values of this form would help us to lead good lives, but there is also considerable uncertainty about this claim, arising from unresolved empirical and conceptual issues in human psychology.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131148103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computers are used to make decisions in an increasing number of domains. There is widespread agreement that some of these uses are ethically problematic. Far less clear is where ethical problems arise, and what might be done about them. This paper expands and defends the Ethical Gravity Thesis: ethical problems that arise at higher levels of analysis of an automated decision-making system are inherited by lower levels of analysis. Particular instantiations of systems can add new problems, but not ameliorate more general ones. We defend this thesis by adapting Marr's famous 1982 framework for understanding information-processing systems. We show how this framework allows one to situate ethical problems at the appropriate level of abstraction, which in turn can be used to target appropriate interventions.
{"title":"The Ethical Gravity Thesis: Marrian Levels and the Persistence of Bias in Automated Decision-making Systems","authors":"A. Kasirzadeh, C. Klein","doi":"10.1145/3461702.3462606","DOIUrl":"https://doi.org/10.1145/3461702.3462606","url":null,"abstract":"Computers are used to make decisions in an increasing number of domains. There is widespread agreement that some of these uses are ethically problematic. Far less clear is where ethical problems arise, and what might be done about them. This paper expands and defends the Ethical Gravity Thesis: ethical problems that arise at higher levels of analysis of an automated decision-making system are inherited by lower levels of analysis. Particular instantiations of systems can add new problems, but not ameliorate more general ones. We defend this thesis by adapting Marr's famous 1982 framework for understanding information-processing systems. We show how this framework allows one to situate ethical problems at the appropriate level of abstraction, which in turn can be used to target appropriate interventions.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125517306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This is a one-page summary of the paper "A Philosophical Theory of Fairness for Prediction-based Decisions." The full paper is available on SSRN at the following link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3450300
{"title":"Fair Equality of Chances for Prediction-based Decisions","authors":"M. Loi, Anders Herlitz, Hoda Heidari","doi":"10.1145/3461702.3462613","DOIUrl":"https://doi.org/10.1145/3461702.3462613","url":null,"abstract":"This is a one-page summary of the paper \"A Philosophical Theory of Fairness for Prediction-based Decisions.\" The full paper is available on SSRN at the following link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3450300","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"2 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120970923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Causal reasoning is the main learning and explanation tool used by humans. AI systems should possess causal reasoning capabilities to be deployed in the real world with trust and reliability. Introducing the ideas of causality to machine learning helps in providing better learning and explainable models. Explainability, causal disentanglement are some important aspects of any machine learning model. Causal explanations are required to believe in a model's decision and causal disentanglement learning is important for transfer learning applications. We exploit the ideas of causality to be used in deep learning models to achieve better and causally explainable models that are useful in fairness, disentangled representation, etc.
{"title":"Causality in Neural Networks - An Extended Abstract","authors":"Abbavaram Gowtham Reddy","doi":"10.1145/3461702.3462467","DOIUrl":"https://doi.org/10.1145/3461702.3462467","url":null,"abstract":"Causal reasoning is the main learning and explanation tool used by humans. AI systems should possess causal reasoning capabilities to be deployed in the real world with trust and reliability. Introducing the ideas of causality to machine learning helps in providing better learning and explainable models. Explainability, causal disentanglement are some important aspects of any machine learning model. Causal explanations are required to believe in a model's decision and causal disentanglement learning is important for transfer learning applications. We exploit the ideas of causality to be used in deep learning models to achieve better and causally explainable models that are useful in fairness, disentangled representation, etc.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115452262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the main lines of research in algorithmic fairness involves individual fairness (IF) methods. Individual fairness is motivated by an intuitive principle, similar treatment, which requires that similar individuals be treated similarly. IF offers a precise account of this principle using distance metrics to evaluate the similarity of individuals. Proponents of individual fairness have argued that it gives the correct definition of algorithmic fairness, and that it should therefore be preferred to other methods for determining fairness. I argue that individual fairness cannot serve as a definition of fairness. Moreover, IF methods should not be given priority over other fairness methods, nor used in isolation from them. To support these conclusions, I describe four in-principle problems for individual fairness as a definition and as a method for ensuring fairness: (1) counterexamples show that similar treatment (and therefore IF) are insufficient to guarantee fairness; (2) IF methods for learning similarity metrics are at risk of encoding human implicit bias; (3) IF requires prior moral judgments, limiting its usefulness as a guide for fairness and undermining its claim to define fairness; and (4) the incommensurability of relevant moral values makes similarity metrics impossible for many tasks. In light of these limitations, I suggest that individual fairness cannot be a definition of fairness, and instead should be seen as one tool among several for ameliorating algorithmic bias.
{"title":"What's Fair about Individual Fairness?","authors":"W. Fleisher","doi":"10.1145/3461702.3462621","DOIUrl":"https://doi.org/10.1145/3461702.3462621","url":null,"abstract":"One of the main lines of research in algorithmic fairness involves individual fairness (IF) methods. Individual fairness is motivated by an intuitive principle, similar treatment, which requires that similar individuals be treated similarly. IF offers a precise account of this principle using distance metrics to evaluate the similarity of individuals. Proponents of individual fairness have argued that it gives the correct definition of algorithmic fairness, and that it should therefore be preferred to other methods for determining fairness. I argue that individual fairness cannot serve as a definition of fairness. Moreover, IF methods should not be given priority over other fairness methods, nor used in isolation from them. To support these conclusions, I describe four in-principle problems for individual fairness as a definition and as a method for ensuring fairness: (1) counterexamples show that similar treatment (and therefore IF) are insufficient to guarantee fairness; (2) IF methods for learning similarity metrics are at risk of encoding human implicit bias; (3) IF requires prior moral judgments, limiting its usefulness as a guide for fairness and undermining its claim to define fairness; and (4) the incommensurability of relevant moral values makes similarity metrics impossible for many tasks. In light of these limitations, I suggest that individual fairness cannot be a definition of fairness, and instead should be seen as one tool among several for ameliorating algorithmic bias.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129270772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work considers universal adversarial triggers, a method of adversarially disrupting natural language models, and questions if it is possible to use such triggers to affect both the topic and stance of conditional text generation models. In considering four "controversial" topics, this work demonstrates success at identifying triggers that cause the GPT-2 model to produce text about targeted topics as well as influence the stance the text takes towards the topic. We show that, while the more fringe topics are more challenging to identify triggers for, they do appear to more effectively discriminate aspects like stance. We view this both as an indication of the dangerous potential for controllability and, perhaps, a reflection of the nature of the disconnect between conflicting views on these topics, something that future work could use to question the nature of filter bubbles and if they are reflected within models trained on internet content. In demonstrating the feasibility and ease of such an attack, this work seeks to raise the awareness that neural language models are susceptible to this influence--even if the model is already deployed and adversaries lack internal model access--and advocates the immediate safeguarding against this type of adversarial attack in order to prevent potential harm to human users.
{"title":"The Earth Is Flat and the Sun Is Not a Star: The Susceptibility of GPT-2 to Universal Adversarial Triggers","authors":"Hunter Scott Heidenreich, J. Williams","doi":"10.1145/3461702.3462578","DOIUrl":"https://doi.org/10.1145/3461702.3462578","url":null,"abstract":"This work considers universal adversarial triggers, a method of adversarially disrupting natural language models, and questions if it is possible to use such triggers to affect both the topic and stance of conditional text generation models. In considering four \"controversial\" topics, this work demonstrates success at identifying triggers that cause the GPT-2 model to produce text about targeted topics as well as influence the stance the text takes towards the topic. We show that, while the more fringe topics are more challenging to identify triggers for, they do appear to more effectively discriminate aspects like stance. We view this both as an indication of the dangerous potential for controllability and, perhaps, a reflection of the nature of the disconnect between conflicting views on these topics, something that future work could use to question the nature of filter bubbles and if they are reflected within models trained on internet content. In demonstrating the feasibility and ease of such an attack, this work seeks to raise the awareness that neural language models are susceptible to this influence--even if the model is already deployed and adversaries lack internal model access--and advocates the immediate safeguarding against this type of adversarial attack in order to prevent potential harm to human users.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125867441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emily Diana, Wesley Gill, Michael Kearns, K. Kenthapadi, Aaron Roth
We consider a recently introduced framework in which fairness is measured by worst-case outcomes across groups, rather than by the more standard differences between group outcomes. In this framework we provide provably convergent oracle-efficient learning algorithms (or equivalently, reductions to non-fair learning) for minimax group fairness. Here the goal is that of minimizing the maximum loss across all groups, rather than equalizing group losses. Our algorithms apply to both regression and classification settings and support both overall error and false positive or false negative rates as the fairness measure of interest. They also support relaxations of the fairness constraints, thus permitting study of the tradeoff between overall accuracy and minimax fairness. We compare the experimental behavior and performance of our algorithms across a variety of fairness-sensitive data sets and show empirical cases in which minimax fairness is strictly and strongly preferable to equal outcome notions.
{"title":"Minimax Group Fairness: Algorithms and Experiments","authors":"Emily Diana, Wesley Gill, Michael Kearns, K. Kenthapadi, Aaron Roth","doi":"10.1145/3461702.3462523","DOIUrl":"https://doi.org/10.1145/3461702.3462523","url":null,"abstract":"We consider a recently introduced framework in which fairness is measured by worst-case outcomes across groups, rather than by the more standard differences between group outcomes. In this framework we provide provably convergent oracle-efficient learning algorithms (or equivalently, reductions to non-fair learning) for minimax group fairness. Here the goal is that of minimizing the maximum loss across all groups, rather than equalizing group losses. Our algorithms apply to both regression and classification settings and support both overall error and false positive or false negative rates as the fairness measure of interest. They also support relaxations of the fairness constraints, thus permitting study of the tradeoff between overall accuracy and minimax fairness. We compare the experimental behavior and performance of our algorithms across a variety of fairness-sensitive data sets and show empirical cases in which minimax fairness is strictly and strongly preferable to equal outcome notions.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125895903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial intelligence (AI) systems operate in increasingly diverse areas, from healthcare to facial recognition, the stock market, autonomous vehicles, and so on. While the underlying digital infrastructure of AI systems is developing rapidly, each area of implementation is subject to different degrees and processes of legitimization. By combining elements from institutional theory and information systems-theory, this paper presents a conceptual framework to analyze and understand AI-induced field-change. The introduction of novel AI-agents into new or existing fields creates a dynamic in which algorithms (re)shape organizations and institutions while existing institutional infrastructures determine the scope and speed at which organizational change is allowed to occur. Where institutional infrastructure and governance arrangements, such as standards, rules, and regulations, still are unelaborate, the field can move fast but is also more likely to be contested. The institutional infrastructure surrounding AI-induced fields is generally little elaborated, which could be an obstacle to the broader institutionalization of AI-systems going forward.
{"title":"A Framework for Understanding AI-Induced Field Change: How AI Technologies are Legitimized and Institutionalized","authors":"B. Larsen","doi":"10.1145/3461702.3462591","DOIUrl":"https://doi.org/10.1145/3461702.3462591","url":null,"abstract":"Artificial intelligence (AI) systems operate in increasingly diverse areas, from healthcare to facial recognition, the stock market, autonomous vehicles, and so on. While the underlying digital infrastructure of AI systems is developing rapidly, each area of implementation is subject to different degrees and processes of legitimization. By combining elements from institutional theory and information systems-theory, this paper presents a conceptual framework to analyze and understand AI-induced field-change. The introduction of novel AI-agents into new or existing fields creates a dynamic in which algorithms (re)shape organizations and institutions while existing institutional infrastructures determine the scope and speed at which organizational change is allowed to occur. Where institutional infrastructure and governance arrangements, such as standards, rules, and regulations, still are unelaborate, the field can move fast but is also more likely to be contested. The institutional infrastructure surrounding AI-induced fields is generally little elaborated, which could be an obstacle to the broader institutionalization of AI-systems going forward.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133151960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethically compliant autonomous systems (ECAS) are the state-of-the-art for solving sequential decision-making problems under uncertainty while respecting constraints that encode ethical considerations. This paper defines a novel concept in the context of ECAS that is from moral philosophy, the moral community, which leads to a nuanced taxonomy of explicit ethical agents. We then propose new ethical frameworks that extend the applicability of ECAS to domains where a moral community is required. Next, we provide a formal analysis of the proposed ethical frameworks and conduct experiments that illustrate their differences. Finally, we discuss the implications of explicit moral communities that could shape research on standards and guidelines for ethical agents in order to better understand and predict common errors in their design and communicate their capabilities.
{"title":"Ethically Compliant Planning within Moral Communities","authors":"Samer B. Nashed, Justin Svegliato, S. Zilberstein","doi":"10.1145/3461702.3462522","DOIUrl":"https://doi.org/10.1145/3461702.3462522","url":null,"abstract":"Ethically compliant autonomous systems (ECAS) are the state-of-the-art for solving sequential decision-making problems under uncertainty while respecting constraints that encode ethical considerations. This paper defines a novel concept in the context of ECAS that is from moral philosophy, the moral community, which leads to a nuanced taxonomy of explicit ethical agents. We then propose new ethical frameworks that extend the applicability of ECAS to domains where a moral community is required. Next, we provide a formal analysis of the proposed ethical frameworks and conduct experiments that illustrate their differences. Finally, we discuss the implications of explicit moral communities that could shape research on standards and guidelines for ethical agents in order to better understand and predict common errors in their design and communicate their capabilities.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114370160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This is an experience report describing a pilot AI Ethics course for undergraduate computer science majors. In addition to teaching students about different ethical approaches and using them to analyze ethical issues, the course covered how ethics has been incorporated into the implementation of explicit ethical agents, and required students to implement an explicit ethical agent for a simple application. This report describes the course objectives and design, the topics covered, and a qualitative evaluation with suggestions for future offerings of the courses.
{"title":"An AI Ethics Course Highlighting Explicit Ethical Agents","authors":"N. Green","doi":"10.1145/3461702.3462552","DOIUrl":"https://doi.org/10.1145/3461702.3462552","url":null,"abstract":"This is an experience report describing a pilot AI Ethics course for undergraduate computer science majors. In addition to teaching students about different ethical approaches and using them to analyze ethical issues, the course covered how ethics has been incorporated into the implementation of explicit ethical agents, and required students to implement an explicit ethical agent for a simple application. This report describes the course objectives and design, the topics covered, and a qualitative evaluation with suggestions for future offerings of the courses.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131559116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}