{"title":"Adversarial Training Negatively Affects Fairness","authors":"Korn Sooksatra, P. Rivas","doi":"10.1109/CSCI54926.2021.00096","DOIUrl":null,"url":null,"abstract":"With the increasing presence of deep learning models, many applications have had significant improvements; however, they face a new vulnerability known as adversarial examples. Adversarial examples can mislead deep learning models to predict the wrong classes without human actors noticing. Recently, many works have tried to improve adversarial examples to make them stronger and more effective. However, although some researchers have invented mechanisms to defend deep learning models against adversarial examples, those mechanisms may negatively affect different measures of fairness, which are critical in practice. This work mathematically defines four fairness scores to show that training adversarially robust models can harm fairness scores. Furthermore, we empirically show that adversarial training, one of the most potent defensive mechanisms against adversarial examples, can harm them.","PeriodicalId":206881,"journal":{"name":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI54926.2021.00096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With the increasing presence of deep learning models, many applications have had significant improvements; however, they face a new vulnerability known as adversarial examples. Adversarial examples can mislead deep learning models to predict the wrong classes without human actors noticing. Recently, many works have tried to improve adversarial examples to make them stronger and more effective. However, although some researchers have invented mechanisms to defend deep learning models against adversarial examples, those mechanisms may negatively affect different measures of fairness, which are critical in practice. This work mathematically defines four fairness scores to show that training adversarially robust models can harm fairness scores. Furthermore, we empirically show that adversarial training, one of the most potent defensive mechanisms against adversarial examples, can harm them.