公平感知数据集成

IF 1.5 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Journal of Data and Information Quality Pub Date : 2022-07-05 DOI:10.1145/3519419
Lacramioara Mazilu, N. Paton, Nikolaos Konstantinou, A. Fernandes
{"title":"公平感知数据集成","authors":"Lacramioara Mazilu, N. Paton, Nikolaos Konstantinou, A. Fernandes","doi":"10.1145/3519419","DOIUrl":null,"url":null,"abstract":"Machine learning can be applied in applications that take decisions that impact people’s lives. Such techniques have the potential to make decision making more objective, but there also is a risk that the decisions can discriminate against certain groups as a result of bias in the underlying data. Reducing bias, or promoting fairness, has been a focus of significant investigation in machine learning, for example, based on pre-processing the training data, changing the learning algorithm, or post-processing the results of the learning. However, prior to these activities, data integration discovers and integrates the data that is used for training, and data integration processes have the potential to produce data that leads to biased conclusions. In this article, we propose an approach that generates schema mappings in ways that take into account: (i) properties that are intrinsic to mapping results that may give rise to bias in analyses; and (ii) bias observed in classifiers trained on the results of different sets of mappings. The approach explores a space of different ways of integrating the data, using a Tabu search algorithm, guided by bias-aware objective functions that represent different types of bias.The resulting approach is evaluated using Adult Census and German Credit datasets to explore the extent to which and the circumstances in which the approach can increase the fairness of the results of the data integration process.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"56 1","pages":"1 - 26"},"PeriodicalIF":1.5000,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fairness-aware Data Integration\",\"authors\":\"Lacramioara Mazilu, N. Paton, Nikolaos Konstantinou, A. Fernandes\",\"doi\":\"10.1145/3519419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning can be applied in applications that take decisions that impact people’s lives. Such techniques have the potential to make decision making more objective, but there also is a risk that the decisions can discriminate against certain groups as a result of bias in the underlying data. Reducing bias, or promoting fairness, has been a focus of significant investigation in machine learning, for example, based on pre-processing the training data, changing the learning algorithm, or post-processing the results of the learning. However, prior to these activities, data integration discovers and integrates the data that is used for training, and data integration processes have the potential to produce data that leads to biased conclusions. In this article, we propose an approach that generates schema mappings in ways that take into account: (i) properties that are intrinsic to mapping results that may give rise to bias in analyses; and (ii) bias observed in classifiers trained on the results of different sets of mappings. The approach explores a space of different ways of integrating the data, using a Tabu search algorithm, guided by bias-aware objective functions that represent different types of bias.The resulting approach is evaluated using Adult Census and German Credit datasets to explore the extent to which and the circumstances in which the approach can increase the fairness of the results of the data integration process.\",\"PeriodicalId\":44355,\"journal\":{\"name\":\"ACM Journal of Data and Information Quality\",\"volume\":\"56 1\",\"pages\":\"1 - 26\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2022-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal of Data and Information Quality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3519419\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3519419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

摘要

机器学习可以应用于影响人们生活的决策。这些技术有可能使决策更加客观,但也存在这样一种风险,即由于基础数据中的偏见,决策可能歧视某些群体。减少偏见,或促进公平,一直是机器学习中重要研究的焦点,例如,基于预处理训练数据,改变学习算法,或后处理学习结果。然而,在这些活动之前,数据集成发现并集成用于培训的数据,数据集成过程有可能产生导致有偏见的结论的数据。在本文中,我们提出了一种生成模式映射的方法,该方法考虑到:(i)映射结果固有的属性,这些属性可能会在分析中产生偏差;(ii)在不同映射集的结果上训练的分类器中观察到的偏差。不同方式的方法探索空间的整合数据,使用禁忌搜索算法,根据bias-aware目标函数代表不同类型的偏见。使用成人普查和德国信用数据集对结果方法进行评估,以探索该方法可以增加数据整合过程结果公平性的程度和情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fairness-aware Data Integration
Machine learning can be applied in applications that take decisions that impact people’s lives. Such techniques have the potential to make decision making more objective, but there also is a risk that the decisions can discriminate against certain groups as a result of bias in the underlying data. Reducing bias, or promoting fairness, has been a focus of significant investigation in machine learning, for example, based on pre-processing the training data, changing the learning algorithm, or post-processing the results of the learning. However, prior to these activities, data integration discovers and integrates the data that is used for training, and data integration processes have the potential to produce data that leads to biased conclusions. In this article, we propose an approach that generates schema mappings in ways that take into account: (i) properties that are intrinsic to mapping results that may give rise to bias in analyses; and (ii) bias observed in classifiers trained on the results of different sets of mappings. The approach explores a space of different ways of integrating the data, using a Tabu search algorithm, guided by bias-aware objective functions that represent different types of bias.The resulting approach is evaluated using Adult Census and German Credit datasets to explore the extent to which and the circumstances in which the approach can increase the fairness of the results of the data integration process.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Journal of Data and Information Quality
ACM Journal of Data and Information Quality COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
4.10
自引率
4.80%
发文量
0
期刊最新文献
Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text A Catalog of Consumer IoT Device Characteristics for Data Quality Estimation AI explainibility and acceptance; a case study for underwater mine hunting Data quality assessment through a preference model Editorial: Special Issue on Quality Aspects of Data Preparation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1