{"title":"A Transformer Architecture based mutual attention for Image Anomaly Detection","authors":"Mengting Zhang, Xiuxia Tian","doi":"10.1016/j.vrih.2022.07.006","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Image anomaly detection is a popular task in computer graphics, which is widely used in industrial fields. Previous works that address this problem often train CNN-based (e.g. Auto-Encoder, GANs) models to reconstruct covered parts of input images and calculate the difference between the input and the reconstructed image. However, convolutional operations are good at extracting local features making it difficult to identify larger image anomalies. To this end, we propose a transformer architecture based on mutual attention for image anomaly separation. This architecture can capture long-term dependencies and fuse local features with global features to facilitate better image anomaly detection. Our method was extensively evaluated on several benchmarks, and experimental results showed that it improved detection capability by 3.1% and localization capability by 1.0% compared with state-of-the-art reconstruction-based methods.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"5 1","pages":"Pages 57-67"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Virtual Reality Intelligent Hardware","FirstCategoryId":"1093","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2096579622000687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Image anomaly detection is a popular task in computer graphics, which is widely used in industrial fields. Previous works that address this problem often train CNN-based (e.g. Auto-Encoder, GANs) models to reconstruct covered parts of input images and calculate the difference between the input and the reconstructed image. However, convolutional operations are good at extracting local features making it difficult to identify larger image anomalies. To this end, we propose a transformer architecture based on mutual attention for image anomaly separation. This architecture can capture long-term dependencies and fuse local features with global features to facilitate better image anomaly detection. Our method was extensively evaluated on several benchmarks, and experimental results showed that it improved detection capability by 3.1% and localization capability by 1.0% compared with state-of-the-art reconstruction-based methods.