Chiara Accinelli, B. Catania, G. Guerrini, Simone Minisi
{"title":"A Coverage-based Approach to Nondiscrimination-aware Data Transformation","authors":"Chiara Accinelli, B. Catania, G. Guerrini, Simone Minisi","doi":"10.1145/3546913","DOIUrl":null,"url":null,"abstract":"The development of technological solutions satisfying nondiscriminatory requirements is one of the main current challenges for data processing. Back-end operators for preparing, i.e., extracting and transforming, data play a relevant role w.r.t. nondiscrimination, since they can introduce bias with an impact on the entire data life-cycle. In this article, we focus on back-end transformations, defined in terms of Select-Project-Join queries, and on coverage. Coverage aims at guaranteeing that the input, or training, dataset includes enough examples for each (protected) category of interest, thus increasing diversity with the aim of limiting the introduction of bias during the next analytical steps. The article proposes an approach to automatically rewrite a transformation with a result that violates coverage constraints, into the “closest” query satisfying the constraints. The approach is approximate and relies on a sample-based cardinality estimation, thus it introduces a trade-off between accuracy and efficiency. The efficiency and the effectiveness of the approach are experimentally validated on synthetic and real data.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"44 1","pages":"1 - 26"},"PeriodicalIF":1.5000,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3546913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 4
Abstract
The development of technological solutions satisfying nondiscriminatory requirements is one of the main current challenges for data processing. Back-end operators for preparing, i.e., extracting and transforming, data play a relevant role w.r.t. nondiscrimination, since they can introduce bias with an impact on the entire data life-cycle. In this article, we focus on back-end transformations, defined in terms of Select-Project-Join queries, and on coverage. Coverage aims at guaranteeing that the input, or training, dataset includes enough examples for each (protected) category of interest, thus increasing diversity with the aim of limiting the introduction of bias during the next analytical steps. The article proposes an approach to automatically rewrite a transformation with a result that violates coverage constraints, into the “closest” query satisfying the constraints. The approach is approximate and relies on a sample-based cardinality estimation, thus it introduces a trade-off between accuracy and efficiency. The efficiency and the effectiveness of the approach are experimentally validated on synthetic and real data.