A Benchmark Proposal for Non-Generative Fair Adversarial Learning Strategies Using a Fairness-Utility Trade-off Metric

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computational Intelligence Pub Date : 2024-10-21 DOI:10.1111/coin.70003

Luiz Fernando F. P. de Lima, Danielle Rousy D. Ricarte, Clauirton A. Siebra

{"title":"A Benchmark Proposal for Non-Generative Fair Adversarial Learning Strategies Using a Fairness-Utility Trade-off Metric","authors":"Luiz Fernando F. P. de Lima, Danielle Rousy D. Ricarte, Clauirton A. Siebra","doi":"10.1111/coin.70003","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>AI systems for decision-making have become increasingly popular in several areas. However, it is possible to identify biased decisions in many applications, which have become a concern for the computer science, artificial intelligence, and law communities. Therefore, researchers are proposing solutions to mitigate bias and discrimination among decision-makers. Some explored strategies are based on GANs to generate fair data. Others are based on adversarial learning to achieve fairness by encoding fairness constraints through an adversarial model. Moreover, it is usual for each proposal to assess its model with a specific metric, making comparing current approaches a complex task. Therefore, this work proposes a systematical benchmark procedure to assess the fair machine learning models. The proposed procedure comprises a fairness-utility trade-off metric (<span></span><math>\n <semantics>\n <mrow>\n <mi>FU-score</mi>\n </mrow>\n <annotation>$$ FU\\hbox{-} score $$</annotation>\n </semantics></math>), the utility and fairness metrics to compose this assessment, the used datasets and preparation, and the statistical test. A previous work presents some of these definitions. The present work enriches the procedure by increasing the applied datasets and statistical guarantees when comparing the models' results. We performed this benchmark evaluation for the non-generative adversarial models, analyzing the literature models from the same metric perspective. This assessment could not indicate a single model which better performs for all datasets. However, we built an understanding of how each model performs on each dataset with statistical confidence.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 5","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70003","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

AI systems for decision-making have become increasingly popular in several areas. However, it is possible to identify biased decisions in many applications, which have become a concern for the computer science, artificial intelligence, and law communities. Therefore, researchers are proposing solutions to mitigate bias and discrimination among decision-makers. Some explored strategies are based on GANs to generate fair data. Others are based on adversarial learning to achieve fairness by encoding fairness constraints through an adversarial model. Moreover, it is usual for each proposal to assess its model with a specific metric, making comparing current approaches a complex task. Therefore, this work proposes a systematical benchmark procedure to assess the fair machine learning models. The proposed procedure comprises a fairness-utility trade-off metric ( $FU-score$ ), the utility and fairness metrics to compose this assessment, the used datasets and preparation, and the statistical test. A previous work presents some of these definitions. The present work enriches the procedure by increasing the applied datasets and statistical guarantees when comparing the models' results. We performed this benchmark evaluation for the non-generative adversarial models, analyzing the literature models from the same metric perspective. This assessment could not indicate a single model which better performs for all datasets. However, we built an understanding of how each model performs on each dataset with statistical confidence.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用公平-效用权衡指标的非生成公平对抗学习策略基准提案

用于决策的人工智能系统在多个领域越来越受欢迎。然而，在许多应用中都有可能发现有偏见的决策，这已成为计算机科学、人工智能和法律界关注的问题。因此，研究人员正在提出一些解决方案，以减少决策者之间的偏见和歧视。一些已探索的策略基于 GAN 生成公平数据。另一些则基于对抗学习，通过对抗模型对公平性约束进行编码来实现公平性。此外，每个建议通常都会用特定的指标来评估其模型，这使得比较当前的方法成为一项复杂的任务。因此，这项工作提出了一个系统的基准程序，用于评估公平的机器学习模型。建议的程序包括公平性-效用权衡指标（FU-score $$ FU\hbox{-} score $$$）、组成该评估的效用和公平性指标、使用的数据集和准备工作以及统计测试。之前的一项工作介绍了其中的一些定义。本研究通过增加应用数据集和比较模型结果时的统计保证，丰富了这一程序。我们对非生成对抗模型进行了基准评估，从相同的度量角度分析了文献模型。这项评估无法指出哪一个模型在所有数据集上都有更好的表现。不过，我们对每种模型在每种数据集上的表现都有了一定的了解，并在统计上有了信心。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Intelligence 工程技术-计算机：人工智能

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

>12 weeks

期刊介绍： This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.