软件公平中的“无知与偏见”

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI:10.1109/ICSE43902.2021.00129

J Zhang, M. Harman

{"title":"软件公平中的“无知与偏见”","authors":"J Zhang, M. Harman","doi":"10.1109/ICSE43902.2021.00129","DOIUrl":null,"url":null,"abstract":"Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem. We find that enlarging the feature set plays a significant role in fairness (with an average effect rate of 38%). Importantly, and contrary to widely-held beliefs that greater fairness often corresponds to lower accuracy, our findings reveal that an enlarged feature set has both higher accuracy and fairness. Perhaps also surprisingly, we find that a larger training data does not help to improve fairness. Our results suggest a larger training data set has more unfairness than a smaller one when feature sets are insufficient; an important cautionary finding for practising software engineers.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"\\\"Ignorance and Prejudice\\\" in Software Fairness\",\"authors\":\"J Zhang, M. Harman\",\"doi\":\"10.1109/ICSE43902.2021.00129\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem. We find that enlarging the feature set plays a significant role in fairness (with an average effect rate of 38%). Importantly, and contrary to widely-held beliefs that greater fairness often corresponds to lower accuracy, our findings reveal that an enlarged feature set has both higher accuracy and fairness. Perhaps also surprisingly, we find that a larger training data does not help to improve fairness. Our results suggest a larger training data set has more unfairness than a smaller one when feature sets are insufficient; an important cautionary finding for practising software engineers.\",\"PeriodicalId\":305167,\"journal\":{\"name\":\"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)\",\"volume\":\"141 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE43902.2021.00129\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 41

摘要

机器学习软件在做出与人类相关的决策时可能是不公平的，对某些群体有偏见。现有的工作主要集中在提出公平指标和提出公平改进方法。目前还不清楚任何机器学习系统的关键方面，如特征集和训练数据，是如何影响公平性的。本文介绍了一项针对这一问题的综合研究的结果。我们发现，扩大特征集对公平性有显著作用(平均有效率为38%)。重要的是，与普遍认为的更高的公平性往往对应更低的准确性相反，我们的研究结果表明，扩大的特征集具有更高的准确性和公平性。也许同样令人惊讶的是，我们发现更大的训练数据无助于提高公平性。我们的结果表明，当特征集不足时，较大的训练数据集比较小的训练数据集具有更大的不公平性;这是对软件工程师的重要警示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

"Ignorance and Prejudice" in Software Fairness

Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem. We find that enlarging the feature set plays a significant role in fairness (with an average effect rate of 38%). Importantly, and contrary to widely-held beliefs that greater fairness often corresponds to lower accuracy, our findings reveal that an enlarged feature set has both higher accuracy and fairness. Perhaps also surprisingly, we find that a larger training data does not help to improve fairness. Our results suggest a larger training data set has more unfairness than a smaller one when feature sets are insufficient; an important cautionary finding for practising software engineers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

MuDelta: Delta-Oriented Mutation Testing at Commit Time Verifying Determinism in Sequential Programs Data-Oriented Differential Testing of Object-Relational Mapping Systems IoT Bugs and Development Challenges Onboarding vs. Diversity, Productivity and Quality — Empirical Study of the OpenStack Ecosystem