{"title":"SAM-LAD: Segment Anything Model meets zero-shot logic anomaly detection","authors":"Yun Peng, Xiao Lin, Nachuan Ma, Jiayuan Du, Chuangwei Liu, Chengju Liu, Qijun Chen","doi":"10.1016/j.knosys.2025.113176","DOIUrl":null,"url":null,"abstract":"<div><div>Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and show poor generalizability due to being heavily data-driven. To fill this gap, we propose SAM-LAD, a zero-shot, plug-and-play framework for anomaly detection in any scene. First, we obtain a query image’s feature map using a pre-trained backbone. Simultaneously, we retrieve the reference images and their corresponding feature maps via the nearest neighbor search. Then, we introduce the Segment Anything Model (SAM) to obtain object masks of the query and reference images. Each object mask is multiplied by the entire image’s feature map to obtain object feature maps. Next, an Object Matching Model (OMM) is proposed to match objects in the query and reference images. To facilitate object matching, we propose a Dynamic Channel Graph Attention (DCGA) module, treating each object as a keypoint and converting its feature maps into feature vectors. Finally, based on the object matching relations, an Anomaly Measurement Model (AMM) is proposed to detect objects with logical anomalies. Structural anomalies in the objects can also be detected. We validate our proposed SAM-LAD using various benchmarks, including industrial datasets (MVTec Loco AD, MVTec AD), and the logical dataset (DigitAnatomy). Extensive experimental results demonstrate that SAM-LAD outperforms existing SoTA methods, particularly in detecting logical anomalies.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"314 ","pages":"Article 113176"},"PeriodicalIF":7.6000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125002230","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and show poor generalizability due to being heavily data-driven. To fill this gap, we propose SAM-LAD, a zero-shot, plug-and-play framework for anomaly detection in any scene. First, we obtain a query image’s feature map using a pre-trained backbone. Simultaneously, we retrieve the reference images and their corresponding feature maps via the nearest neighbor search. Then, we introduce the Segment Anything Model (SAM) to obtain object masks of the query and reference images. Each object mask is multiplied by the entire image’s feature map to obtain object feature maps. Next, an Object Matching Model (OMM) is proposed to match objects in the query and reference images. To facilitate object matching, we propose a Dynamic Channel Graph Attention (DCGA) module, treating each object as a keypoint and converting its feature maps into feature vectors. Finally, based on the object matching relations, an Anomaly Measurement Model (AMM) is proposed to detect objects with logical anomalies. Structural anomalies in the objects can also be detected. We validate our proposed SAM-LAD using various benchmarks, including industrial datasets (MVTec Loco AD, MVTec AD), and the logical dataset (DigitAnatomy). Extensive experimental results demonstrate that SAM-LAD outperforms existing SoTA methods, particularly in detecting logical anomalies.
视觉异常检测在实际应用中是至关重要的,例如工业缺陷检测和医疗诊断。然而,现有的方法大多关注局部结构异常,无法检测到逻辑条件下更高层次的功能异常。虽然最近的研究已经探索了逻辑异常检测,但它们只能处理简单的异常,如缺失或添加,并且由于大量数据驱动而表现出较差的通用性。为了填补这一空白,我们提出了SAM-LAD,一种零射击,即插即用的框架,用于任何场景的异常检测。首先,我们使用预先训练好的主干来获得查询图像的特征映射。同时,我们通过最近邻搜索来检索参考图像及其对应的特征映射。然后,我们引入分段任意模型(SAM)来获取查询和参考图像的对象掩码。每个目标掩码乘以整个图像的特征映射,得到目标特征映射。其次,提出了一个对象匹配模型(OMM)来匹配查询和参考图像中的对象。为了方便目标匹配,我们提出了动态通道图注意(DCGA)模块,将每个目标作为关键点,并将其特征映射转换为特征向量。最后,基于对象匹配关系,提出了一种异常测量模型(AMM)来检测具有逻辑异常的对象。物体的结构异常也可以被检测到。我们使用各种基准测试来验证我们提出的SAM-LAD,包括工业数据集(MVTec Loco AD, MVTec AD)和逻辑数据集(DigitAnatomy)。大量的实验结果表明,SAM-LAD优于现有的SoTA方法,特别是在检测逻辑异常方面。
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.