ETDNet: Efficient Transformer-Based Detection Network for Surface Defect Detection

IF 8.3 2区材料科学 Q1 MATERIALS SCIENCE, MULTIDISCIPLINARY ACS Applied Materials & Interfaces Pub Date : 2023-01-01 DOI:10.1109/TIM.2023.3307753

Hantao Zhou;Rui Yang;Runze Hu;Chang Shu;Xiaochu Tang;Xiu Li

{"title":"ETDNet: Efficient Transformer-Based Detection Network for Surface Defect Detection","authors":"Hantao Zhou;Rui Yang;Runze Hu;Chang Shu;Xiaochu Tang;Xiu Li","doi":"10.1109/TIM.2023.3307753","DOIUrl":null,"url":null,"abstract":"Deep learning (DL)-based surface defect detectors play a crucial role in ensuring product quality during inspection processes. However, accurately and efficiently detecting defects remain challenging due to specific characteristics inherent in defective images, involving a high degree of foreground–background similarity, scale variation, and shape variation. To address this challenge, we propose an efficient transformer-based detection network, ETDNet, consisting of three novel designs to achieve superior performance. First, ETDNet takes a lightweight vision transformer (ViT) to extract representative global features. This approach ensures an accurate feature characterization of defects even with similar backgrounds. Second, a channel-modulated feature pyramid network (CM-FPN) is devised to fuse multilevel features and maintain critical information from corresponding levels. Finally, a novel task-oriented decoupled (TOD) head is introduced to tackle inconsistent representation between classification and regression tasks. The TOD head employs a local feature representation (LFR) module to learn object-aware local features and introduces a global feature representation (GFR) module, based on the attention mechanism, to learn content-aware global features. By integrating these two modules into the head, ETDNet can effectively classify and perceive defects with varying shapes and scales. Extensive experiments on various defect detection datasets demonstrate the effectiveness of the proposed ETDNet. For instance, it achieves AP 46.7% (versus 45.9%) and $\\mathrm {AP_{50}}~80.2$ % (versus 79.1%) with 49 frames/s on NEU-DET. The code is available at https://github.com/zht8506/ETDNet.","PeriodicalId":5,"journal":{"name":"ACS Applied Materials & Interfaces","volume":null,"pages":null},"PeriodicalIF":8.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Materials & Interfaces","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10227321/","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 1

Abstract

Deep learning (DL)-based surface defect detectors play a crucial role in ensuring product quality during inspection processes. However, accurately and efficiently detecting defects remain challenging due to specific characteristics inherent in defective images, involving a high degree of foreground–background similarity, scale variation, and shape variation. To address this challenge, we propose an efficient transformer-based detection network, ETDNet, consisting of three novel designs to achieve superior performance. First, ETDNet takes a lightweight vision transformer (ViT) to extract representative global features. This approach ensures an accurate feature characterization of defects even with similar backgrounds. Second, a channel-modulated feature pyramid network (CM-FPN) is devised to fuse multilevel features and maintain critical information from corresponding levels. Finally, a novel task-oriented decoupled (TOD) head is introduced to tackle inconsistent representation between classification and regression tasks. The TOD head employs a local feature representation (LFR) module to learn object-aware local features and introduces a global feature representation (GFR) module, based on the attention mechanism, to learn content-aware global features. By integrating these two modules into the head, ETDNet can effectively classify and perceive defects with varying shapes and scales. Extensive experiments on various defect detection datasets demonstrate the effectiveness of the proposed ETDNet. For instance, it achieves AP 46.7% (versus 45.9%) and $\mathrm {AP_{50}}~80.2$ % (versus 79.1%) with 49 frames/s on NEU-DET. The code is available at https://github.com/zht8506/ETDNet.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ETDNet：用于表面缺陷检测的高效变压器检测网络

基于深度学习（DL）的表面缺陷检测器在确保检测过程中的产品质量方面发挥着至关重要的作用。然而，由于缺陷图像固有的特定特征，包括高度的前景-背景相似性、尺度变化和形状变化，准确有效地检测缺陷仍然具有挑战性。为了应对这一挑战，我们提出了一种高效的基于变压器的检测网络ETDNet，该网络由三种新颖的设计组成，以实现卓越的性能。首先，ETDNet采用轻量级视觉转换器（ViT）来提取具有代表性的全局特征。这种方法确保了缺陷的精确特征表征，即使背景相似。其次，设计了一种信道调制特征金字塔网络（CM-FPN）来融合多级特征并保持来自相应级别的关键信息。最后，引入了一种新的面向任务的解耦（TOD）头来解决分类任务和回归任务之间的不一致表示问题。TOD头采用局部特征表示（LFR）模块来学习感知对象的局部特征，并引入了基于注意力机制的全局特征表示（GFR）模块，以学习感知内容的全局特征。通过将这两个模块集成到头部中，ETDNet可以有效地对不同形状和规模的缺陷进行分类和感知。在各种缺陷检测数据集上进行的大量实验证明了所提出的ETDNet的有效性。例如，它在NEU-DET上以49帧/秒的速度实现了46.7%的AP（对45.9%）和$\mathrm｛AP_｛50｝｝～80.2$%（对79.1%）。代码可在https://github.com/zht8506/ETDNet.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACS Applied Materials & Interfaces 工程技术-材料科学：综合

CiteScore

16.00

自引率

6.30%

发文量

4978

审稿时长

1.8 months

期刊介绍： ACS Applied Materials & Interfaces is a leading interdisciplinary journal that brings together chemists, engineers, physicists, and biologists to explore the development and utilization of newly-discovered materials and interfacial processes for specific applications. Our journal has experienced remarkable growth since its establishment in 2009, both in terms of the number of articles published and the impact of the research showcased. We are proud to foster a truly global community, with the majority of published articles originating from outside the United States, reflecting the rapid growth of applied research worldwide.