Qian-mao Hu, Bo Tang, Lin Jiang, Faxun Zhu, Xiaoke Zhao
{"title":"Rail Surface Defects Detection Based on Yolo v5 Integrated with Transformer","authors":"Qian-mao Hu, Bo Tang, Lin Jiang, Faxun Zhu, Xiaoke Zhao","doi":"10.1109/icet55676.2022.9824255","DOIUrl":null,"url":null,"abstract":"The traditional machine vision detection method needs to manually design the characteristics of the target, the feature expression ability is insufficient and the generalization ability is not strong. Deep learning can automatically learn high-level feature information, improve the efficiency and accuracy of image recognition, and has better adaptability and universality. Transformer abandons the structure of CNN with deep neural network mainly based on self-attention mechanism, which can be processed in parallel and has global information. This paper combines CNN with Transformer and integrates transformer’s attention mechanism into Yolo V5 network structure to detect rail surface defects. The AP (average precision) of Type-I and Type-II rail defects reached 99.5% and 97.8% respectively, and FPS (frame per second) reaches 76.92 on RSDDs dataset.","PeriodicalId":166358,"journal":{"name":"2022 IEEE 5th International Conference on Electronics Technology (ICET)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Electronics Technology (ICET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icet55676.2022.9824255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The traditional machine vision detection method needs to manually design the characteristics of the target, the feature expression ability is insufficient and the generalization ability is not strong. Deep learning can automatically learn high-level feature information, improve the efficiency and accuracy of image recognition, and has better adaptability and universality. Transformer abandons the structure of CNN with deep neural network mainly based on self-attention mechanism, which can be processed in parallel and has global information. This paper combines CNN with Transformer and integrates transformer’s attention mechanism into Yolo V5 network structure to detect rail surface defects. The AP (average precision) of Type-I and Type-II rail defects reached 99.5% and 97.8% respectively, and FPS (frame per second) reaches 76.92 on RSDDs dataset.