Iris Dominguez-Catena, Daniel Paternain, Mikel Galar
{"title":"DSAP: Analyzing bias through demographic comparison of datasets","authors":"Iris Dominguez-Catena, Daniel Paternain, Mikel Galar","doi":"10.1016/j.inffus.2024.102760","DOIUrl":null,"url":null,"abstract":"<div><div>In the last few years, Artificial Intelligence (AI) systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our awareness of these biases, we still lack general tools to detect, quantify, and compare them across different datasets. In this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of datasets. First, DSAP uses existing demographic estimation models to extract a dataset’s demographic profile. Second, it applies a similarity metric to compare the demographic profiles of different datasets. While these individual components are well-known, their joint use for demographic dataset comparison is novel and has not been previously addressed in the literature. This approach allows three key applications: the identification of demographic blind spots and bias issues across datasets, the measurement of demographic bias, and the assessment of demographic shifts over time. DSAP can be used on datasets with or without explicit demographic information, provided that demographic information can be derived from the samples using auxiliary models, such as those for image or voice datasets. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at <span><span>https://github.com/irisdominguez/DSAP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"115 ","pages":"Article 102760"},"PeriodicalIF":14.7000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005384","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the last few years, Artificial Intelligence (AI) systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our awareness of these biases, we still lack general tools to detect, quantify, and compare them across different datasets. In this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of datasets. First, DSAP uses existing demographic estimation models to extract a dataset’s demographic profile. Second, it applies a similarity metric to compare the demographic profiles of different datasets. While these individual components are well-known, their joint use for demographic dataset comparison is novel and has not been previously addressed in the literature. This approach allows three key applications: the identification of demographic blind spots and bias issues across datasets, the measurement of demographic bias, and the assessment of demographic shifts over time. DSAP can be used on datasets with or without explicit demographic information, provided that demographic information can be derived from the samples using auxiliary models, such as those for image or voice datasets. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at https://github.com/irisdominguez/DSAP.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.