RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image
{"title":"RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image","authors":"Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li","doi":"10.1007/s11554-024-01469-x","DOIUrl":null,"url":null,"abstract":"<p>A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"43 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Real-Time Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11554-024-01469-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.
期刊介绍:
Due to rapid advancements in integrated circuit technology, the rich theoretical results that have been developed by the image and video processing research community are now being increasingly applied in practical systems to solve real-world image and video processing problems. Such systems involve constraints placed not only on their size, cost, and power consumption, but also on the timeliness of the image data processed.
Examples of such systems are mobile phones, digital still/video/cell-phone cameras, portable media players, personal digital assistants, high-definition television, video surveillance systems, industrial visual inspection systems, medical imaging devices, vision-guided autonomous robots, spectral imaging systems, and many other real-time embedded systems. In these real-time systems, strict timing requirements demand that results are available within a certain interval of time as imposed by the application.
It is often the case that an image processing algorithm is developed and proven theoretically sound, presumably with a specific application in mind, but its practical applications and the detailed steps, methodology, and trade-off analysis required to achieve its real-time performance are not fully explored, leaving these critical and usually non-trivial issues for those wishing to employ the algorithm in a real-time system.
The Journal of Real-Time Image Processing is intended to bridge the gap between the theory and practice of image processing, serving the greater community of researchers, practicing engineers, and industrial professionals who deal with designing, implementing or utilizing image processing systems which must satisfy real-time design constraints.