{"title":"Face Biometric Fairness Evaluation on Real vs Synthetic Cross-Spectral Images","authors":"K. Lai, V. Shmerko, S. Yanushkevich","doi":"10.1109/CogMI56440.2022.00024","DOIUrl":null,"url":null,"abstract":"In this paper, we compare the performance and fairness metrics on visual and thermal images of faces, including the synthetic images of human subjects with face masks. The comparative experiment is performed on two datasets: the SpeakingFace and Thermal-Mask dataset. We assess fairness on real images and show how the same process can be applied to synthetic images. The chosen fairness metrics include demographic parity difference and equalized odds difference. While the demographic parity difference is assessed as 1.24 for random guessing in the process of face identification, it reaches 5.0 when both the precision and recall rate approach 99.99%. These results confirm that inherently biased datasets significantly impact the fairness of any biometric system. For biometric-enabled systems, fairness is related to the adequacy of the data to represent different groups of human subjects. In this paper, we focus on three demographic groups: age, gender, and ethnicity. A primary cause of biases with respect to these groups is the class imbalance introduced through the data collection process. To address the imbalanced datasets, the classes with fewer samples can be augmented with synthetic images to generate a more balanced dataset, resulting in less bias when training a machine learning system. The study shows that fairness is correlated to the performance of the system rather than to the genesis of the images (real or synthetic). The experiment on a simple 3-Block CNN with a precision and recall rate of 99.99% using the demographic parity difference as an estimate of fairness showed that among gender, ethnicity, and age, the latter is an attribute that is the most sensitive while age is the least one.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogMI56440.2022.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we compare the performance and fairness metrics on visual and thermal images of faces, including the synthetic images of human subjects with face masks. The comparative experiment is performed on two datasets: the SpeakingFace and Thermal-Mask dataset. We assess fairness on real images and show how the same process can be applied to synthetic images. The chosen fairness metrics include demographic parity difference and equalized odds difference. While the demographic parity difference is assessed as 1.24 for random guessing in the process of face identification, it reaches 5.0 when both the precision and recall rate approach 99.99%. These results confirm that inherently biased datasets significantly impact the fairness of any biometric system. For biometric-enabled systems, fairness is related to the adequacy of the data to represent different groups of human subjects. In this paper, we focus on three demographic groups: age, gender, and ethnicity. A primary cause of biases with respect to these groups is the class imbalance introduced through the data collection process. To address the imbalanced datasets, the classes with fewer samples can be augmented with synthetic images to generate a more balanced dataset, resulting in less bias when training a machine learning system. The study shows that fairness is correlated to the performance of the system rather than to the genesis of the images (real or synthetic). The experiment on a simple 3-Block CNN with a precision and recall rate of 99.99% using the demographic parity difference as an estimate of fairness showed that among gender, ethnicity, and age, the latter is an attribute that is the most sensitive while age is the least one.