Lacramioara Mazilu, N. Paton, Nikolaos Konstantinou, A. Fernandes
{"title":"Fairness-aware Data Integration","authors":"Lacramioara Mazilu, N. Paton, Nikolaos Konstantinou, A. Fernandes","doi":"10.1145/3519419","DOIUrl":null,"url":null,"abstract":"Machine learning can be applied in applications that take decisions that impact people’s lives. Such techniques have the potential to make decision making more objective, but there also is a risk that the decisions can discriminate against certain groups as a result of bias in the underlying data. Reducing bias, or promoting fairness, has been a focus of significant investigation in machine learning, for example, based on pre-processing the training data, changing the learning algorithm, or post-processing the results of the learning. However, prior to these activities, data integration discovers and integrates the data that is used for training, and data integration processes have the potential to produce data that leads to biased conclusions. In this article, we propose an approach that generates schema mappings in ways that take into account: (i) properties that are intrinsic to mapping results that may give rise to bias in analyses; and (ii) bias observed in classifiers trained on the results of different sets of mappings. The approach explores a space of different ways of integrating the data, using a Tabu search algorithm, guided by bias-aware objective functions that represent different types of bias.The resulting approach is evaluated using Adult Census and German Credit datasets to explore the extent to which and the circumstances in which the approach can increase the fairness of the results of the data integration process.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"56 1","pages":"1 - 26"},"PeriodicalIF":1.5000,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3519419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
Machine learning can be applied in applications that take decisions that impact people’s lives. Such techniques have the potential to make decision making more objective, but there also is a risk that the decisions can discriminate against certain groups as a result of bias in the underlying data. Reducing bias, or promoting fairness, has been a focus of significant investigation in machine learning, for example, based on pre-processing the training data, changing the learning algorithm, or post-processing the results of the learning. However, prior to these activities, data integration discovers and integrates the data that is used for training, and data integration processes have the potential to produce data that leads to biased conclusions. In this article, we propose an approach that generates schema mappings in ways that take into account: (i) properties that are intrinsic to mapping results that may give rise to bias in analyses; and (ii) bias observed in classifiers trained on the results of different sets of mappings. The approach explores a space of different ways of integrating the data, using a Tabu search algorithm, guided by bias-aware objective functions that represent different types of bias.The resulting approach is evaluated using Adult Census and German Credit datasets to explore the extent to which and the circumstances in which the approach can increase the fairness of the results of the data integration process.