尖峰 ViT:用于钢材表面缺陷分类的尖峰神经网络与变压器注意事项

IF 1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Journal of Electronic Imaging Pub Date : 2024-05-01 DOI:10.1117/1.jei.33.3.033001
Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge
{"title":"尖峰 ViT:用于钢材表面缺陷分类的尖峰神经网络与变压器注意事项","authors":"Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge","doi":"10.1117/1.jei.33.3.033001","DOIUrl":null,"url":null,"abstract":"Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification\",\"authors\":\"Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge\",\"doi\":\"10.1117/1.jei.33.3.033001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.3.033001\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.3.033001","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

在整个钢铁生产过程中,不可避免地会出现各种表面缺陷。这些缺陷会损害钢铁产品的质量,降低生产效率。因此,对钢带表面的多种缺陷进行研究和分类至关重要。视觉变换器(ViT)是一种基于自我注意机制的独特神经网络模型,被广泛应用于许多不同学科。传统的 ViT 忽略了大脑信号传递的具体细节,而是使用激活函数来模拟真正的神经元。尖峰神经网络的基本构件之一是泄漏整合与发射(LIF),它具有与真正神经元类似的生物动力学特征。LIF 神经元以事件驱动的方式工作,因此能以更低的功耗实现更高的性能。这项工作的目标是整合 ViT 神经元和 LIF 神经元,构建并训练一种端到端混合网络架构--尖峰视觉转换器(S-ViT),用于对钢铁表面缺陷进行分类。该框架以 ViT 架构为基础,用 LIF 神经元替换了 ViT 中使用的激活函数,构建了全局尖峰特征融合模块尖峰变换器编码器和尖峰-MLP 分类头,以实现分类功能,并将其作为 S-ViT 的基本构建模块。根据实验结果,我们的方法在所有指标上都表现出了出色的分类性能。在 NEU-CLS 上,S-ViT 的总体测试准确率分别为 99.41%、99.65%、99.54% 和 99.77%;在 XSDD 上,S-ViT 的总体测试准确率分别为 95.70%、95.93%、96.94% 和 97.19%。与卷积神经网络和最新研究成果相比,S-ViT 的分类性能更为出色。与原始 ViT 模型相比,其性能也有所提高。此外,S-ViT 的鲁棒性测试结果表明,在识别含有高斯噪声的图像时,S-ViT 仍能保持可靠的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification
Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Electronic Imaging
Journal of Electronic Imaging 工程技术-成像科学与照相技术
CiteScore
1.70
自引率
27.30%
发文量
341
审稿时长
4.0 months
期刊介绍: The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.
期刊最新文献
DTSIDNet: a discrete wavelet and transformer based network for single image denoising Multi-head attention with reinforcement learning for supervised video summarization End-to-end multitasking network for smart container product positioning and segmentation Generative object separation in X-ray images Toward effective local dimming-driven liquid crystal displays: a deep curve estimation–based adaptive compensation solution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1