Data augmentation via diffusion model to enhance AI fairness.

IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2025-03-19 eCollection Date: 2025-01-01 DOI:10.3389/frai.2025.1530397
Christina Hastings Blow, Lijun Qian, Camille Gibson, Pamela Obiomon, Xishuang Dong
{"title":"Data augmentation via diffusion model to enhance AI fairness.","authors":"Christina Hastings Blow, Lijun Qian, Camille Gibson, Pamela Obiomon, Xishuang Dong","doi":"10.3389/frai.2025.1530397","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision.</p><p><strong>Methods: </strong>This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models-Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)-were used to validate the proposed approach.</p><p><strong>Results and discussion: </strong>Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1530397"},"PeriodicalIF":4.7000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11961955/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2025.1530397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision.

Methods: This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models-Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)-were used to validate the proposed approach.

Results and discussion: Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过扩散模型增强数据增强人工智能公平性。
导言:人工智能的公平性旨在提高人工智能系统的透明度和可解释性,确保其结果真正反映用户的最佳利益。数据增强(Data augmentation)是指从现有数据集中生成合成数据,作为解决数据稀缺问题的一种方法,它已受到广泛关注。特别是在计算机视觉等领域,扩散模型已成为生成合成数据的强大技术:本文探讨了扩散模型生成合成表格数据以提高人工智能公平性的潜力。表格去噪扩散概率模型(Tab-DDPM)是一种扩散模型,可适用于任何表格数据集,并能处理各种特征类型。此外,还采用了来自 AIF360 的重新加权样本,以进一步提高人工智能的公平性。五种传统机器学习模型--决策树(DT)、高斯直觉贝叶斯(GNB)、K-最近邻(KNN)、逻辑回归(LR)和随机森林(RF)--被用来验证所提出的方法:实验结果表明,Tab-DDPM 生成的合成数据提高了二元分类的公平性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
期刊最新文献
Modelling societal preferences for automated vehicle behaviour with ethical goal functions. Explainable multilingual and multimodal fake-news detection: toward robust and trustworthy AI for combating misinformation. Exploring the use and perceived impact of artificial intelligence in medical internship: a cross-sectional study of Palestinian doctors. Minimal reduct for propositional circumscription. Machine learning techniques for improved prediction of cardiovascular diseases using integrated healthcare data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1