Data augmentation via diffusion model to enhance AI fairness.

IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2025-03-19 eCollection Date: 2025-01-01 DOI:10.3389/frai.2025.1530397

Christina Hastings Blow, Lijun Qian, Camille Gibson, Pamela Obiomon, Xishuang Dong

{"title":"Data augmentation via diffusion model to enhance AI fairness.","authors":"Christina Hastings Blow, Lijun Qian, Camille Gibson, Pamela Obiomon, Xishuang Dong","doi":"10.3389/frai.2025.1530397","DOIUrl":null,"url":null,"abstract":"Introduction: AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision.Methods: This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models-Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)-were used to validate the proposed approach.Results and discussion: Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1530397"},"PeriodicalIF":4.7000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11961955/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2025.1530397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision.

Methods: This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models-Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)-were used to validate the proposed approach.

Results and discussion: Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过扩散模型增强数据增强人工智能公平性。

导言：人工智能的公平性旨在提高人工智能系统的透明度和可解释性，确保其结果真正反映用户的最佳利益。数据增强（Data augmentation）是指从现有数据集中生成合成数据，作为解决数据稀缺问题的一种方法，它已受到广泛关注。特别是在计算机视觉等领域，扩散模型已成为生成合成数据的强大技术：本文探讨了扩散模型生成合成表格数据以提高人工智能公平性的潜力。表格去噪扩散概率模型（Tab-DDPM）是一种扩散模型，可适用于任何表格数据集，并能处理各种特征类型。此外，还采用了来自 AIF360 的重新加权样本，以进一步提高人工智能的公平性。五种传统机器学习模型--决策树（DT）、高斯直觉贝叶斯（GNB）、K-最近邻（KNN）、逻辑回归（LR）和随机森林（RF）--被用来验证所提出的方法：实验结果表明，Tab-DDPM 生成的合成数据提高了二元分类的公平性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊