DAP-SDD: Distribution-Aware Pseudo Labeling for Small Defect Detection

AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD) Pub Date : 2022-04-20 DOI:10.3390/cmsf2022003005

Xiaoyan Zhuo, Wolfgang Rahfeldt, Xiaoqian Zhang, Ted Doros, S. Son

{"title":"DAP-SDD: Distribution-Aware Pseudo Labeling for Small Defect Detection","authors":"Xiaoyan Zhuo, Wolfgang Rahfeldt, Xiaoqian Zhang, Ted Doros, S. Son","doi":"10.3390/cmsf2022003005","DOIUrl":null,"url":null,"abstract":": Detecting defects, especially when they are small in the early manufacturing stages, is critical to achieving a high yield in industrial applications. While numerous modern deep learning models can improve detection performance, they become less effective in detecting small defects in practical applications due to the scarcity of labeled data and signiﬁcant class imbalance in multiple dimensions. In this work, we propose a distribution-aware pseudo labeling method (DAP-SDD) to detect small defects accurately while using limited labeled data effectively. Speciﬁcally, we apply bootstrapping on limited labeled data and then utilize the approximated label distribution to guide pseudo label propagation. Moreover, we propose to use the t-distribution conﬁdence interval for threshold setting to generate more pseudo labels with high conﬁdence. DAP-SDD also incorporates data augmentation to enhance the model’s performance and robustness. We conduct extensive experiments on various datasets to validate the proposed method. Our evaluation results show that, overall, our proposed method requires less than 10% of labeled data to achieve comparable results of using a fully-labeled (100%) dataset and outperforms the state-of-the-art methods. For a dataset of wafer images, our proposed model can achieve above 0.93 of AP (average precision) with only four labeled images (i.e., 2% of labeled data).","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/cmsf2022003005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

: Detecting defects, especially when they are small in the early manufacturing stages, is critical to achieving a high yield in industrial applications. While numerous modern deep learning models can improve detection performance, they become less effective in detecting small defects in practical applications due to the scarcity of labeled data and signiﬁcant class imbalance in multiple dimensions. In this work, we propose a distribution-aware pseudo labeling method (DAP-SDD) to detect small defects accurately while using limited labeled data effectively. Speciﬁcally, we apply bootstrapping on limited labeled data and then utilize the approximated label distribution to guide pseudo label propagation. Moreover, we propose to use the t-distribution conﬁdence interval for threshold setting to generate more pseudo labels with high conﬁdence. DAP-SDD also incorporates data augmentation to enhance the model’s performance and robustness. We conduct extensive experiments on various datasets to validate the proposed method. Our evaluation results show that, overall, our proposed method requires less than 10% of labeled data to achieve comparable results of using a fully-labeled (100%) dataset and outperforms the state-of-the-art methods. For a dataset of wafer images, our proposed model can achieve above 0.93 of AP (average precision) with only four labeled images (i.e., 2% of labeled data).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

小缺陷检测的分布感知伪标记

在工业应用中，检测缺陷，特别是在早期制造阶段的小缺陷，是实现高产量的关键。虽然许多现代深度学习模型可以提高检测性能，但由于标记数据的稀缺性和多维度的显著类不平衡，它们在实际应用中检测小缺陷的效率较低。在这项工作中，我们提出了一种分布感知伪标记方法(DAP-SDD)，在有效使用有限标记数据的情况下准确检测小缺陷。具体来说，我们在有限的标记数据上应用自举，然后利用近似的标签分布来指导伪标签传播。此外，我们建议使用t分布置信区间进行阈值设置，以生成更多具有高置信度的伪标签。DAP-SDD还结合了数据增强，以提高模型的性能和鲁棒性。我们在不同的数据集上进行了大量的实验来验证所提出的方法。我们的评估结果表明，总体而言，我们提出的方法只需要不到10%的标记数据就可以达到使用完全标记(100%)数据集的可比结果，并且优于最先进的方法。对于晶圆图像数据集，我们提出的模型仅使用四张标记图像(即标记数据的2%)就可以达到0.93 AP(平均精度)以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)

自引率

0.00%

发文量

期刊最新文献

Statement of Peer Review Age Should Not Matter: Towards More Accurate Pedestrian Detection via Self-Training Extracting Salient Facts from Company Reviews with Scarce Labels Dual Complementary Prototype Learning for Few-Shot Segmentation Super-Resolution for Brain MR Images from a Significantly Small Amount of Training Data