{"title":"Statement of Peer Review","authors":"Kuan-Chuan Peng, Ziyan Wu","doi":"10.3390/cmsf2022003012","DOIUrl":"https://doi.org/10.3390/cmsf2022003012","url":null,"abstract":"","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116282857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shunsuke Kogure, Kai Watabe, Ryosuke Yamada, Y. Aoki, Akio Nakamura, Hirokatsu Kataoka
Why is there a the disparity in the miss rates of pedestrian detection between different age attributes? In this study, we propose to (i) improve the accuracy of pedestrian detection using our pre-trained model and (ii) explore the causes of this disparity. In order to improve detection accuracy, we extend a pedestrian detection pre-training dataset, the Weakly Supervised Pedestrian Dataset (WSPD), by means of self-training, to construct our Self-Trained Person Dataset (STPD). More-over, we hypothesise the cause of the miss rate as being due to three biases: 1) the apparent bias towards “adults” versus “children,” 2) the quantity of training data bias against “chil- dren,” and 3) the scale bias of the bounding box. In addition, we constructed an evaluation dataset by manually annotat- ing “adult” and “child” bounding boxes to the INRIA Person Dataset. As a result, we confirm that the miss rate was re- duced by up to 0.4% for adults and up to 3.9% for children. In addition, we discuss the impact of the size and appearance of the bounding boxes on the disparity in miss rates and pro-vide an outlook for future research.
为什么不同年龄属性的行人检测失检率存在差异?在本研究中,我们建议(i)使用我们的预训练模型提高行人检测的准确性,(ii)探索这种差异的原因。为了提高检测精度,我们扩展了行人检测预训练数据集弱监督行人数据集(WSPD),通过自我训练来构建我们的自我训练人员数据集(STPD)。此外,我们假设缺失率的原因是由于三种偏差:1)对“成人”与“儿童”的明显偏差,2)对“儿童”的训练数据量偏差,以及3)边界框的尺度偏差。此外,我们通过在INRIA Person dataset上手动标注“成人”和“儿童”边界框,构建了一个评估数据集。结果,我们确认,失误率降低了——成人减少了0.4%,儿童减少了3.9%。此外,我们还讨论了边界框的大小和外观对脱靶率差异的影响,并对未来的研究进行了展望。
{"title":"Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training","authors":"Shunsuke Kogure, Kai Watabe, Ryosuke Yamada, Y. Aoki, Akio Nakamura, Hirokatsu Kataoka","doi":"10.3390/cmsf2022003011","DOIUrl":"https://doi.org/10.3390/cmsf2022003011","url":null,"abstract":"Why is there a the disparity in the miss rates of pedestrian detection between different age attributes? In this study, we propose to (i) improve the accuracy of pedestrian detection using our pre-trained model and (ii) explore the causes of this disparity. In order to improve detection accuracy, we extend a pedestrian detection pre-training dataset, the Weakly Supervised Pedestrian Dataset (WSPD), by means of self-training, to construct our Self-Trained Person Dataset (STPD). More-over, we hypothesise the cause of the miss rate as being due to three biases: 1) the apparent bias towards “adults” versus “children,” 2) the quantity of training data bias against “chil- dren,” and 3) the scale bias of the bounding box. In addition, we constructed an evaluation dataset by manually annotat- ing “adult” and “child” bounding boxes to the INRIA Person Dataset. As a result, we confirm that the miss rate was re- duced by up to 0.4% for adults and up to 3.9% for children. In addition, we discuss the impact of the size and appearance of the bounding boxes on the disparity in miss rates and pro-vide an outlook for future research.","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117146113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Few-shot semantic segmentation aims to transfer knowledge from base classes with sufficient data to represent novel classes with limited few-shot samples. Recent methods follow a metric learning framework with prototypes for foreground representation. However, they still face the challenge of segmentation of novel classes due to inadequate representation of foreground and lack of discriminability between foreground and background. To address this problem, we propose the Dual Complementary prototype Network (DCNet). Firstly, we design a training-free Complementary Prototype Generation (CPG) module to extract comprehensive information from the mask region in the support image. Secondly, we design a Background Guided Learning (BGL) as a complementary branch of the foreground segmentation branch, which enlarges difference between the foreground and its corresponding background so that the representation of novel class in the foreground could be more discriminative. Extensive experiments on PASCAL-5 i and COCO-20 i demonstrate that our DCNet achieves state-of-the-art
{"title":"Dual Complementary Prototype Learning for Few-Shot Segmentation","authors":"Q. Ren, Jie Chen","doi":"10.3390/cmsf2022003008","DOIUrl":"https://doi.org/10.3390/cmsf2022003008","url":null,"abstract":": Few-shot semantic segmentation aims to transfer knowledge from base classes with sufficient data to represent novel classes with limited few-shot samples. Recent methods follow a metric learning framework with prototypes for foreground representation. However, they still face the challenge of segmentation of novel classes due to inadequate representation of foreground and lack of discriminability between foreground and background. To address this problem, we propose the Dual Complementary prototype Network (DCNet). Firstly, we design a training-free Complementary Prototype Generation (CPG) module to extract comprehensive information from the mask region in the support image. Secondly, we design a Background Guided Learning (BGL) as a complementary branch of the foreground segmentation branch, which enlarges difference between the foreground and its corresponding background so that the representation of novel class in the foreground could be more discriminative. Extensive experiments on PASCAL-5 i and COCO-20 i demonstrate that our DCNet achieves state-of-the-art","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128747245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose the task of extracting salient facts from online company reviews. Salient facts present unique and distinctive information about a company, which helps the user in deciding whether to apply to the company. We formulate the salient fact extraction task as a text classification problem, and leverage pretrained language models to tackle the problem. However, the scarcity of salient facts in company reviews causes a serious label imbalance issue, which hinders taking full advantage of pretrained language models. To address the issue, we developed two data enrichment methods: first, representation enrichment, which highlights uncommon tokens by appending special tokens, and second, label propagation, which interactively creates pseudopositive examples from unlabeled data. Experimental results on an online company review corpus show that our approach improves the performance of pretrained language models by up to an F1 score of 0.24. We also confirm that our approach competitively performs well against the state-of-the-art data augmentation method on the SemEval 2019 benchmark even when trained with only 20% of
{"title":"Extracting Salient Facts from Company Reviews with Scarce Labels","authors":"Jinfeng Li, Nikita Bhutani, Alexander Whedon, Chieh-Yang Huang, Estevam Hruschka, Yoshihiko Suhara","doi":"10.3390/cmsf2022003009","DOIUrl":"https://doi.org/10.3390/cmsf2022003009","url":null,"abstract":"In this paper, we propose the task of extracting salient facts from online company reviews. Salient facts present unique and distinctive information about a company, which helps the user in deciding whether to apply to the company. We formulate the salient fact extraction task as a text classification problem, and leverage pretrained language models to tackle the problem. However, the scarcity of salient facts in company reviews causes a serious label imbalance issue, which hinders taking full advantage of pretrained language models. To address the issue, we developed two data enrichment methods: first, representation enrichment, which highlights uncommon tokens by appending special tokens, and second, label propagation, which interactively creates pseudopositive examples from unlabeled data. Experimental results on an online company review corpus show that our approach improves the performance of pretrained language models by up to an F1 score of 0.24. We also confirm that our approach competitively performs well against the state-of-the-art data augmentation method on the SemEval 2019 benchmark even when trained with only 20% of","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125432127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kumpei Ikuta, H. Iyatomi, K. Oishi, on behalf of the Alzheimer’s Disease Neuroimaging Initiative
article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at Abstract: We propose two essential techniques to effectively train generative adversarial network-based super-resolution networks for brain magnetic resonance images, even when only a small number of training samples are available. First, stochastic patch sampling is proposed, which in-creases training samples by sampling many small patches from the input image. However, sampling patches and combining them causes unpleasant artifacts around patch boundaries. The second proposed method, an artifact-suppressing discriminator, suppresses the artifacts by taking two-channel input containing an original high-resolution image and a generated image. With the introduction of the proposed techniques, the network achieved generation of natural-looking MR images from only ~40 training images, and improved the area-under-curve score on Alzheimer’s disease from 76.17% to 81.57%.
{"title":"Super-Resolution for Brain MR Images from a Significantly Small Amount of Training Data","authors":"Kumpei Ikuta, H. Iyatomi, K. Oishi, on behalf of the Alzheimer’s Disease Neuroimaging Initiative","doi":"10.3390/cmsf2022003007","DOIUrl":"https://doi.org/10.3390/cmsf2022003007","url":null,"abstract":"article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at Abstract: We propose two essential techniques to effectively train generative adversarial network-based super-resolution networks for brain magnetic resonance images, even when only a small number of training samples are available. First, stochastic patch sampling is proposed, which in-creases training samples by sampling many small patches from the input image. However, sampling patches and combining them causes unpleasant artifacts around patch boundaries. The second proposed method, an artifact-suppressing discriminator, suppresses the artifacts by taking two-channel input containing an original high-resolution image and a generated image. With the introduction of the proposed techniques, the network achieved generation of natural-looking MR images from only ~40 training images, and improved the area-under-curve score on Alzheimer’s disease from 76.17% to 81.57%.","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126994606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo
: Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous findings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be significant ( p < 0.05).
{"title":"Quantifying Bias in a Face Verification System","authors":"Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo","doi":"10.3390/cmsf2022003006","DOIUrl":"https://doi.org/10.3390/cmsf2022003006","url":null,"abstract":": Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous findings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be significant ( p < 0.05).","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132036575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyan Zhuo, Wolfgang Rahfeldt, Xiaoqian Zhang, Ted Doros, S. Son
: Detecting defects, especially when they are small in the early manufacturing stages, is critical to achieving a high yield in industrial applications. While numerous modern deep learning models can improve detection performance, they become less effective in detecting small defects in practical applications due to the scarcity of labeled data and significant class imbalance in multiple dimensions. In this work, we propose a distribution-aware pseudo labeling method (DAP-SDD) to detect small defects accurately while using limited labeled data effectively. Specifically, we apply bootstrapping on limited labeled data and then utilize the approximated label distribution to guide pseudo label propagation. Moreover, we propose to use the t-distribution confidence interval for threshold setting to generate more pseudo labels with high confidence. DAP-SDD also incorporates data augmentation to enhance the model’s performance and robustness. We conduct extensive experiments on various datasets to validate the proposed method. Our evaluation results show that, overall, our proposed method requires less than 10% of labeled data to achieve comparable results of using a fully-labeled (100%) dataset and outperforms the state-of-the-art methods. For a dataset of wafer images, our proposed model can achieve above 0.93 of AP (average precision) with only four labeled images (i.e., 2% of labeled data).
{"title":"DAP-SDD: Distribution-Aware Pseudo Labeling for Small Defect Detection","authors":"Xiaoyan Zhuo, Wolfgang Rahfeldt, Xiaoqian Zhang, Ted Doros, S. Son","doi":"10.3390/cmsf2022003005","DOIUrl":"https://doi.org/10.3390/cmsf2022003005","url":null,"abstract":": Detecting defects, especially when they are small in the early manufacturing stages, is critical to achieving a high yield in industrial applications. While numerous modern deep learning models can improve detection performance, they become less effective in detecting small defects in practical applications due to the scarcity of labeled data and significant class imbalance in multiple dimensions. In this work, we propose a distribution-aware pseudo labeling method (DAP-SDD) to detect small defects accurately while using limited labeled data effectively. Specifically, we apply bootstrapping on limited labeled data and then utilize the approximated label distribution to guide pseudo label propagation. Moreover, we propose to use the t-distribution confidence interval for threshold setting to generate more pseudo labels with high confidence. DAP-SDD also incorporates data augmentation to enhance the model’s performance and robustness. We conduct extensive experiments on various datasets to validate the proposed method. Our evaluation results show that, overall, our proposed method requires less than 10% of labeled data to achieve comparable results of using a fully-labeled (100%) dataset and outperforms the state-of-the-art methods. For a dataset of wafer images, our proposed model can achieve above 0.93 of AP (average precision) with only four labeled images (i.e., 2% of labeled data).","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125236256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Y. Fu, Mayee F. Chen, Michael Zhang, K. Fatahalian, C. Ré
: Supervised contrastive learning optimizes a loss that pushes together embeddings of points from the same class while pulling apart embeddings of points from different classes. Class collapse—when every point from the same class has the same embedding—minimizes this loss but loses critical information that is not encoded in the class labels. For instance, the “cat” label does not capture unlabeled categories such as breeds, poses, or backgrounds (which we call “strata”). As a result, class collapse produces embeddings that are less useful for downstream applications such as transfer learning and achieves suboptimal generalization error when there are strata. We explore a simple modification to supervised contrastive loss that aims to prevent class collapse by uniformly pulling apart individual points from the same class. We seek to understand the effects of this loss by examining how it embeds strata of different sizes, finding that it clusters larger strata more tightly than smaller strata. As a result, our loss function produces embeddings that better distinguish strata in embedding space, which produces lift on three downstream applications: 4.4 points on coarse-to-fine transfer learning, 2.5 points on worst-group robustness, and 1.0 points on minimal coreset construction. Our loss also produces more accurate models, with up to 4.0 points of lift across 9 tasks.
{"title":"The Details Matter: Preventing Class Collapse in Supervised Contrastive Learning","authors":"Daniel Y. Fu, Mayee F. Chen, Michael Zhang, K. Fatahalian, C. Ré","doi":"10.3390/cmsf2022003004","DOIUrl":"https://doi.org/10.3390/cmsf2022003004","url":null,"abstract":": Supervised contrastive learning optimizes a loss that pushes together embeddings of points from the same class while pulling apart embeddings of points from different classes. Class collapse—when every point from the same class has the same embedding—minimizes this loss but loses critical information that is not encoded in the class labels. For instance, the “cat” label does not capture unlabeled categories such as breeds, poses, or backgrounds (which we call “strata”). As a result, class collapse produces embeddings that are less useful for downstream applications such as transfer learning and achieves suboptimal generalization error when there are strata. We explore a simple modification to supervised contrastive loss that aims to prevent class collapse by uniformly pulling apart individual points from the same class. We seek to understand the effects of this loss by examining how it embeds strata of different sizes, finding that it clusters larger strata more tightly than smaller strata. As a result, our loss function produces embeddings that better distinguish strata in embedding space, which produces lift on three downstream applications: 4.4 points on coarse-to-fine transfer learning, 2.5 points on worst-group robustness, and 1.0 points on minimal coreset construction. Our loss also produces more accurate models, with up to 4.0 points of lift across 9 tasks.","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123750541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Recent works in machine learning have focused on understanding and mitigating bias in data and algorithms. Because the pre-trained models are trained on large real-world data, they are known to learn implicit biases in a way that humans unconsciously constructed for a long time. However, there has been little discussion about social biases with pre-trained face recognition models. Thus, this study investigates the robustness of the models against racial, gender, age, and an intersectional bias. We also present the racial bias with a different ethnicity other than white and black: Asian. In detail, we introduce the Face Embedding Association Test (FEAT) to measure the social biases in image vectors of faces with different race, gender, and age. It measures social bias in the face recognition models under the hypothesis that a specific group is more likely to be associated with a particular attribute in a biased manner. The presence of these biases within DeepFace, DeepID, VGGFace, FaceNet, OpenFace, and ArcFace critically mitigate the fairness in our society.
{"title":"Measuring Embedded Human-Like Biases in Face Recognition Models","authors":"Sangeun Lee, Soyoung Oh, Minji Kim, Eunil Park","doi":"10.3390/cmsf2022003002","DOIUrl":"https://doi.org/10.3390/cmsf2022003002","url":null,"abstract":": Recent works in machine learning have focused on understanding and mitigating bias in data and algorithms. Because the pre-trained models are trained on large real-world data, they are known to learn implicit biases in a way that humans unconsciously constructed for a long time. However, there has been little discussion about social biases with pre-trained face recognition models. Thus, this study investigates the robustness of the models against racial, gender, age, and an intersectional bias. We also present the racial bias with a different ethnicity other than white and black: Asian. In detail, we introduce the Face Embedding Association Test (FEAT) to measure the social biases in image vectors of faces with different race, gender, and age. It measures social bias in the face recognition models under the hypothesis that a specific group is more likely to be associated with a particular attribute in a biased manner. The presence of these biases within DeepFace, DeepID, VGGFace, FaceNet, OpenFace, and ArcFace critically mitigate the fairness in our society.","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121005011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Transformer models are now increasingly being used in real-world applications. Indiscrim-inately using these models as automated tools may propagate biases in ways we do not realize. To responsibly direct actions that will combat this problem, it is of crucial importance that we detect and quantify these biases. Robust methods have been developed to measure bias in non-contextualized embeddings. Nevertheless, these methods fail to apply to contextualized embeddings due to their mutable nature. Our study focuses on the detection and measurement of stereotypical biases associated with gender in the embeddings of T5 and mT5. We quantify bias by measuring the gender polarity of T5’s word embeddings for various professions. To measure gender polarity, we use a stable gender direction that we detect in the model’s embedding space. We also measure gender bias with respect to a specific downstream task and compare Swedish with English, as well as various sizes of the T5 model and its multilingual variant. The insights from our exploration indicate that the use of a stable gender direction, even in a Transformer’s mutable embedding space, can be a robust method to measure bias. We show that higher status professions are associated more with the male gender than the female gender. In addition, our method suggests that the Swedish language carries less bias associated with gender than English, and the higher manifestation of gender bias is associated with the use of larger language models.
{"title":"Measuring Gender Bias in Contextualized Embeddings","authors":"Styliani Katsarou, Borja Rodríguez-Gálvez, Jesse Shanahan","doi":"10.3390/cmsf2022003003","DOIUrl":"https://doi.org/10.3390/cmsf2022003003","url":null,"abstract":": Transformer models are now increasingly being used in real-world applications. Indiscrim-inately using these models as automated tools may propagate biases in ways we do not realize. To responsibly direct actions that will combat this problem, it is of crucial importance that we detect and quantify these biases. Robust methods have been developed to measure bias in non-contextualized embeddings. Nevertheless, these methods fail to apply to contextualized embeddings due to their mutable nature. Our study focuses on the detection and measurement of stereotypical biases associated with gender in the embeddings of T5 and mT5. We quantify bias by measuring the gender polarity of T5’s word embeddings for various professions. To measure gender polarity, we use a stable gender direction that we detect in the model’s embedding space. We also measure gender bias with respect to a specific downstream task and compare Swedish with English, as well as various sizes of the T5 model and its multilingual variant. The insights from our exploration indicate that the use of a stable gender direction, even in a Transformer’s mutable embedding space, can be a robust method to measure bias. We show that higher status professions are associated more with the male gender than the female gender. In addition, our method suggests that the Swedish language carries less bias associated with gender than English, and the higher manifestation of gender bias is associated with the use of larger language models.","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"1 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131539652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}