Human-Guided Zero-Shot Surface Defect Semantic Segmentation

IF 5.9 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-02-11 DOI:10.1109/TIM.2025.3538058
Yuxin Jin;Yunzhou Zhang;Dexing Shan;Zhifei Wu
{"title":"Human-Guided Zero-Shot Surface Defect Semantic Segmentation","authors":"Yuxin Jin;Yunzhou Zhang;Dexing Shan;Zhifei Wu","doi":"10.1109/TIM.2025.3538058","DOIUrl":null,"url":null,"abstract":"Existing surface defect semantic segmentation methods are limited by costly annotated data and are unable to cope with new or rare defect types. Zero-shot learning offers a new possibility for addressing this issue by reducing reliance on extensive annotated data. However, methods that solely rely on image information waste the valuable experience that humans have accumulated in the field of defect detection. In this work, we propose a human-guided segmentation network (HGNet) based on CLIP, introducing human guidance to address the data scarcity and effectively leverage expert knowledge, leading to more accurate and reliable surface defect segmentation. HGNet, guided by the human-provided text, consists of two novel modules: 1) attention-based multilevel feature fusion (AMFF) which effectively integrates multilevel features using attention mechanisms to enhance the fine-grained information capture and 2) multimodal feature adaptive balancing (MFAB) which aligns and balances multimodal features through dynamic adjustment and optimization. Moreover, we extend HGNet to HGNet+ by incorporating interactive learning to correct segmentation errors with human-provided points. Our proposed method can generalize to unseen classes without additional training samples for retraining, meeting the practical needs of industrial defect detection. Extensive experiments on Defect-<inline-formula> <tex-math>$4^{i}$ </tex-math></inline-formula> (and MVTec-ZSS) demonstrate that our method outperforms the state-of-the-art zero-shot methods by 5.7%/7.81% (6.57%/8.06%) and is even comparable to the performance of existing few-shot methods.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-13"},"PeriodicalIF":5.9000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10880677/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Existing surface defect semantic segmentation methods are limited by costly annotated data and are unable to cope with new or rare defect types. Zero-shot learning offers a new possibility for addressing this issue by reducing reliance on extensive annotated data. However, methods that solely rely on image information waste the valuable experience that humans have accumulated in the field of defect detection. In this work, we propose a human-guided segmentation network (HGNet) based on CLIP, introducing human guidance to address the data scarcity and effectively leverage expert knowledge, leading to more accurate and reliable surface defect segmentation. HGNet, guided by the human-provided text, consists of two novel modules: 1) attention-based multilevel feature fusion (AMFF) which effectively integrates multilevel features using attention mechanisms to enhance the fine-grained information capture and 2) multimodal feature adaptive balancing (MFAB) which aligns and balances multimodal features through dynamic adjustment and optimization. Moreover, we extend HGNet to HGNet+ by incorporating interactive learning to correct segmentation errors with human-provided points. Our proposed method can generalize to unseen classes without additional training samples for retraining, meeting the practical needs of industrial defect detection. Extensive experiments on Defect- $4^{i}$ (and MVTec-ZSS) demonstrate that our method outperforms the state-of-the-art zero-shot methods by 5.7%/7.81% (6.57%/8.06%) and is even comparable to the performance of existing few-shot methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工制导零射击表面缺陷语义分割
现有的表面缺陷语义分割方法受标注数据昂贵的限制,无法处理新的或罕见的缺陷类型。零射击学习通过减少对大量注释数据的依赖,为解决这个问题提供了一种新的可能性。然而,单纯依靠图像信息的方法浪费了人类在缺陷检测领域积累的宝贵经验。在这项工作中,我们提出了一种基于CLIP的人类引导分割网络(HGNet),引入人类引导来解决数据稀缺问题,有效利用专家知识,从而实现更准确可靠的表面缺陷分割。HGNet以人类提供的文本为指导,由两个新颖的模块组成:1)基于注意力的多层特征融合(AMFF),利用注意力机制有效集成多层特征,增强细粒度信息捕获;2)多模态特征自适应平衡(MFAB),通过动态调整和优化对多模态特征进行对齐和平衡。此外,我们将HGNet扩展到HGNet+,结合交互式学习来纠正人工提供点的切分错误。该方法可以泛化到不可见的类,无需额外的训练样本进行再训练,满足工业缺陷检测的实际需要。在Defect- $4^{i}$(和MVTec-ZSS)上的大量实验表明,我们的方法比最先进的零射击方法性能好5.7%/7.81%(6.57%/8.06%),甚至与现有的少射击方法性能相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Instrumentation and Measurement
IEEE Transactions on Instrumentation and Measurement 工程技术-工程:电子与电气
CiteScore
9.00
自引率
23.20%
发文量
1294
审稿时长
3.9 months
期刊介绍: Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.
期刊最新文献
Corrections to “TAG: A Temporal Attentive Gait Network for Cross-View Gait Recognition” Design of a SpinCilium MEMS Vector Hydrophone Driven by the COBYLA Algorithm Sparse Array Synthetic Aperture Focusing With pth Coherence Factor Weighted Delay and Sum Beamforming for Nondestructive Testing Two Industrial Twin Soft Sensing Methods With Estimation Interval Based on Symmetric Skewed Distributions and Combined Weights A Passive Detection Method of Gas Cloud Concentration Distributions for Leaking Alkane Gas
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1