E-FAIR-DB: Functional Dependencies to Discover Data Bias and Enhance Data Equity

IF 1.5 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Journal of Data and Information Quality Pub Date : 2022-08-04 DOI:10.1145/3552433
Fabio Azzalini, Chiara Criscuolo, L. Tanca
{"title":"E-FAIR-DB: Functional Dependencies to Discover Data Bias and Enhance Data Equity","authors":"Fabio Azzalini, Chiara Criscuolo, L. Tanca","doi":"10.1145/3552433","DOIUrl":null,"url":null,"abstract":"Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency—a type of data constraint—aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"9 1","pages":"1 - 26"},"PeriodicalIF":1.5000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3552433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency—a type of data constraint—aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
E-FAIR-DB:发现数据偏差和增强数据公平性的功能依赖关系
基于算法和数据生成的系统的决策已经成为渗透我们日常生活方方面面的重要工具;为了使这些进展可靠,结果应该准确,但也应该尊重数据公平的所有方面[11]。在这种背景下,公平性和多样性的概念已经成为数据科学伦理领域和数据科学领域讨论的相关主题。虽然数据公平是可取的,但是将这一属性与准确的决策相协调是一个关键的权衡,因为应用修复过程来恢复公平可能会以这样一种方式修改原始数据,从而使最终决策与分析的最终目标相比是不准确的。在这项工作中,我们提出了E-FAIR-DB,这是一种新颖的解决方案,利用功能依赖(一种数据约束)的概念,旨在通过发现和解决数据集中的歧视来恢复数据公平。提出的解决方案是作为一个管道实现的,首先,挖掘功能依赖关系以检测和评估输入数据集的公平性和多样性,然后,基于这些理解和数据分析的目标,减轻数据偏差,最大限度地减少修改次数。我们的工具可以通过挖掘的依赖关系来识别包含歧视的数据库属性(例如,性别、种族或宗教);然后,基于这些依赖关系,它确定必须添加和/或删除的最小数据量,以减轻这种偏差。我们通过理论考虑和两个现实世界数据集的实验来评估我们的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Journal of Data and Information Quality
ACM Journal of Data and Information Quality COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
4.10
自引率
4.80%
发文量
0
期刊最新文献
Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text A Catalog of Consumer IoT Device Characteristics for Data Quality Estimation AI explainibility and acceptance; a case study for underwater mine hunting Data quality assessment through a preference model Editorial: Special Issue on Quality Aspects of Data Preparation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1