{"title":"E-FAIR-DB: Functional Dependencies to Discover Data Bias and Enhance Data Equity","authors":"Fabio Azzalini, Chiara Criscuolo, L. Tanca","doi":"10.1145/3552433","DOIUrl":null,"url":null,"abstract":"Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency—a type of data constraint—aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"9 1","pages":"1 - 26"},"PeriodicalIF":1.5000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3552433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2
Abstract
Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency—a type of data constraint—aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.