{"title":"DHT:用于工业缺陷图像分类的混合窗口注意动态视觉变换器","authors":"Chao Ding, Donglin Teng, Xianghua Zheng, Qiang Wang, Yuanyuan He, Zhang Long","doi":"10.1109/MIM.2023.10083000","DOIUrl":null,"url":null,"abstract":"Industrial defect detection is gaining importance in the control of industrial product quality. Highly accurate and efficient defect detection with complex and variable industrial defect types is therefore an interesting but challenging problem. Vision transformers have been highly successful in a variety of computer vision tasks, due to their ability to capture global information in images. Nevertheless, simply capturing global information is problematic. On the one hand, because they are incapable of inductive bias as Convolutional Neural Network (CNN), transformers will have difficulty focusing on local features of defects in industrial defect image inspection tasks. On the other hand, using global computation leads to excessive memory and computational cost. To mitigate these issues, we propose a new vision transformer architecture which contains Hybrid Window Attention (HWA) and Dynamic Token Normalization (DTN). HWA, which combines pooling attention and window attention, makes the computational complexity reduced to improve efficiency. DTN enables transformers to focus on both the global information and the local features of defects, thus providing improved accuracy of industrial surface defect detection. Extensive experiments demonstrate that our Dynamic Vision Transformer (DHT) achieves 96.8% and 98.5% classification accuracy on the NEU dataset and the DAGM dataset, respectively, with a low computational complexity.","PeriodicalId":55025,"journal":{"name":"IEEE Instrumentation & Measurement Magazine","volume":"26 1","pages":"19-28"},"PeriodicalIF":1.6000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DHT: Dynamic Vision Transformer Using Hybrid Window Attention for Industrial Defect Images Classification\",\"authors\":\"Chao Ding, Donglin Teng, Xianghua Zheng, Qiang Wang, Yuanyuan He, Zhang Long\",\"doi\":\"10.1109/MIM.2023.10083000\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Industrial defect detection is gaining importance in the control of industrial product quality. Highly accurate and efficient defect detection with complex and variable industrial defect types is therefore an interesting but challenging problem. Vision transformers have been highly successful in a variety of computer vision tasks, due to their ability to capture global information in images. Nevertheless, simply capturing global information is problematic. On the one hand, because they are incapable of inductive bias as Convolutional Neural Network (CNN), transformers will have difficulty focusing on local features of defects in industrial defect image inspection tasks. On the other hand, using global computation leads to excessive memory and computational cost. To mitigate these issues, we propose a new vision transformer architecture which contains Hybrid Window Attention (HWA) and Dynamic Token Normalization (DTN). HWA, which combines pooling attention and window attention, makes the computational complexity reduced to improve efficiency. DTN enables transformers to focus on both the global information and the local features of defects, thus providing improved accuracy of industrial surface defect detection. Extensive experiments demonstrate that our Dynamic Vision Transformer (DHT) achieves 96.8% and 98.5% classification accuracy on the NEU dataset and the DAGM dataset, respectively, with a low computational complexity.\",\"PeriodicalId\":55025,\"journal\":{\"name\":\"IEEE Instrumentation & Measurement Magazine\",\"volume\":\"26 1\",\"pages\":\"19-28\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Instrumentation & Measurement Magazine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/MIM.2023.10083000\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Instrumentation & Measurement Magazine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/MIM.2023.10083000","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
DHT: Dynamic Vision Transformer Using Hybrid Window Attention for Industrial Defect Images Classification
Industrial defect detection is gaining importance in the control of industrial product quality. Highly accurate and efficient defect detection with complex and variable industrial defect types is therefore an interesting but challenging problem. Vision transformers have been highly successful in a variety of computer vision tasks, due to their ability to capture global information in images. Nevertheless, simply capturing global information is problematic. On the one hand, because they are incapable of inductive bias as Convolutional Neural Network (CNN), transformers will have difficulty focusing on local features of defects in industrial defect image inspection tasks. On the other hand, using global computation leads to excessive memory and computational cost. To mitigate these issues, we propose a new vision transformer architecture which contains Hybrid Window Attention (HWA) and Dynamic Token Normalization (DTN). HWA, which combines pooling attention and window attention, makes the computational complexity reduced to improve efficiency. DTN enables transformers to focus on both the global information and the local features of defects, thus providing improved accuracy of industrial surface defect detection. Extensive experiments demonstrate that our Dynamic Vision Transformer (DHT) achieves 96.8% and 98.5% classification accuracy on the NEU dataset and the DAGM dataset, respectively, with a low computational complexity.
期刊介绍:
IEEE Instrumentation & Measurement Magazine is a bimonthly publication. It publishes in February, April, June, August, October, and December of each year. The magazine covers a wide variety of topics in instrumentation, measurement, and systems that measure or instrument equipment or other systems. The magazine has the goal of providing readable introductions and overviews of technology in instrumentation and measurement to a wide engineering audience. It does this through articles, tutorials, columns, and departments. Its goal is to cross disciplines to encourage further research and development in instrumentation and measurement.