SAM-LAD: Segment Anything Model meets zero-shot logic anomaly detection

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2025-04-08 Epub Date: 2025-02-20 DOI:10.1016/j.knosys.2025.113176
Yun Peng, Xiao Lin, Nachuan Ma, Jiayuan Du, Chuangwei Liu, Chengju Liu, Qijun Chen
{"title":"SAM-LAD: Segment Anything Model meets zero-shot logic anomaly detection","authors":"Yun Peng,&nbsp;Xiao Lin,&nbsp;Nachuan Ma,&nbsp;Jiayuan Du,&nbsp;Chuangwei Liu,&nbsp;Chengju Liu,&nbsp;Qijun Chen","doi":"10.1016/j.knosys.2025.113176","DOIUrl":null,"url":null,"abstract":"<div><div>Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and show poor generalizability due to being heavily data-driven. To fill this gap, we propose SAM-LAD, a zero-shot, plug-and-play framework for anomaly detection in any scene. First, we obtain a query image’s feature map using a pre-trained backbone. Simultaneously, we retrieve the reference images and their corresponding feature maps via the nearest neighbor search. Then, we introduce the Segment Anything Model (SAM) to obtain object masks of the query and reference images. Each object mask is multiplied by the entire image’s feature map to obtain object feature maps. Next, an Object Matching Model (OMM) is proposed to match objects in the query and reference images. To facilitate object matching, we propose a Dynamic Channel Graph Attention (DCGA) module, treating each object as a keypoint and converting its feature maps into feature vectors. Finally, based on the object matching relations, an Anomaly Measurement Model (AMM) is proposed to detect objects with logical anomalies. Structural anomalies in the objects can also be detected. We validate our proposed SAM-LAD using various benchmarks, including industrial datasets (MVTec Loco AD, MVTec AD), and the logical dataset (DigitAnatomy). Extensive experimental results demonstrate that SAM-LAD outperforms existing SoTA methods, particularly in detecting logical anomalies.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"314 ","pages":"Article 113176"},"PeriodicalIF":7.6000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125002230","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and show poor generalizability due to being heavily data-driven. To fill this gap, we propose SAM-LAD, a zero-shot, plug-and-play framework for anomaly detection in any scene. First, we obtain a query image’s feature map using a pre-trained backbone. Simultaneously, we retrieve the reference images and their corresponding feature maps via the nearest neighbor search. Then, we introduce the Segment Anything Model (SAM) to obtain object masks of the query and reference images. Each object mask is multiplied by the entire image’s feature map to obtain object feature maps. Next, an Object Matching Model (OMM) is proposed to match objects in the query and reference images. To facilitate object matching, we propose a Dynamic Channel Graph Attention (DCGA) module, treating each object as a keypoint and converting its feature maps into feature vectors. Finally, based on the object matching relations, an Anomaly Measurement Model (AMM) is proposed to detect objects with logical anomalies. Structural anomalies in the objects can also be detected. We validate our proposed SAM-LAD using various benchmarks, including industrial datasets (MVTec Loco AD, MVTec AD), and the logical dataset (DigitAnatomy). Extensive experimental results demonstrate that SAM-LAD outperforms existing SoTA methods, particularly in detecting logical anomalies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SAM-LAD:分段任何模型满足零射击逻辑异常检测
视觉异常检测在实际应用中是至关重要的,例如工业缺陷检测和医疗诊断。然而,现有的方法大多关注局部结构异常,无法检测到逻辑条件下更高层次的功能异常。虽然最近的研究已经探索了逻辑异常检测,但它们只能处理简单的异常,如缺失或添加,并且由于大量数据驱动而表现出较差的通用性。为了填补这一空白,我们提出了SAM-LAD,一种零射击,即插即用的框架,用于任何场景的异常检测。首先,我们使用预先训练好的主干来获得查询图像的特征映射。同时,我们通过最近邻搜索来检索参考图像及其对应的特征映射。然后,我们引入分段任意模型(SAM)来获取查询和参考图像的对象掩码。每个目标掩码乘以整个图像的特征映射,得到目标特征映射。其次,提出了一个对象匹配模型(OMM)来匹配查询和参考图像中的对象。为了方便目标匹配,我们提出了动态通道图注意(DCGA)模块,将每个目标作为关键点,并将其特征映射转换为特征向量。最后,基于对象匹配关系,提出了一种异常测量模型(AMM)来检测具有逻辑异常的对象。物体的结构异常也可以被检测到。我们使用各种基准测试来验证我们提出的SAM-LAD,包括工业数据集(MVTec Loco AD, MVTec AD)和逻辑数据集(DigitAnatomy)。大量的实验结果表明,SAM-LAD优于现有的SoTA方法,特别是在检测逻辑异常方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
期刊最新文献
Revisiting the role of linguistic knowledge in large language models through prompting ACO–PAL: A prior-Aware learning framework for local path planning in complex environments LLM-enabled universal traffic signal control across different intersections and traffic flows Multi-view semi-supervised classification via innovative graph construction and smoothness-aware graph convolution Galio: Defending ownership of AI-generated images against content-preserving tampering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1