RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Real-Time Image Processing Pub Date : 2024-05-11 DOI:10.1007/s11554-024-01469-x

Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li

{"title":"RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image","authors":"Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li","doi":"10.1007/s11554-024-01469-x","DOIUrl":null,"url":null,"abstract":"<p>A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"43 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Real-Time Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11554-024-01469-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RCSLFNet：基于重新参数化卷积和通道空间位置融合注意力的新型实时行人检测网络，适用于低分辨率红外图像

本研究介绍了一种新型实时红外行人检测算法。所提出的方法利用重新参数化卷积和信道空间位置融合注意力来解决红外行人图像中的低分辨率、部分遮挡和环境干扰所带来的困难。这些因素一直阻碍着传统算法对行人的准确检测。首先，针对低分辨率和部分遮挡导致的红外行人目标特征表征不强的问题，设计了一种融合通道和空间的新注意力模块，并将其引入到 CSPDarkNet53 中，设计出一种新的主干 CSLF-DarkNet53。设计的注意力模型可以增强行人目标的特征表达能力，使行人目标在复杂背景中更加突出。其次，为了提高检测效率，加快收敛速度，设计了多分支解耦检测头，将红外行人的分类和定位分开操作。最后，为了在不损失精度的情况下改善较差的实时性，我们引入了重新参数化卷积（Repconv），利用参数标识变换将训练过程和检测过程解耦。在训练过程中，为了提高小卷积核的拟合能力，我们设计了一种具有不同尺度卷积核的多分支结构。实验结果表明，与优秀的经典检测算法相比，所提出的 RCSLFNet 不仅能准确检测复杂环境中部分遮挡的红外行人，而且在 KAIST 数据集上具有更好的实时性能。mAP@0.5 达到 86%，检测时间为 0.0081 s，比基线高出 2.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Real-Time Image Processing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

6.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Due to rapid advancements in integrated circuit technology, the rich theoretical results that have been developed by the image and video processing research community are now being increasingly applied in practical systems to solve real-world image and video processing problems. Such systems involve constraints placed not only on their size, cost, and power consumption, but also on the timeliness of the image data processed. Examples of such systems are mobile phones, digital still/video/cell-phone cameras, portable media players, personal digital assistants, high-definition television, video surveillance systems, industrial visual inspection systems, medical imaging devices, vision-guided autonomous robots, spectral imaging systems, and many other real-time embedded systems. In these real-time systems, strict timing requirements demand that results are available within a certain interval of time as imposed by the application. It is often the case that an image processing algorithm is developed and proven theoretically sound, presumably with a specific application in mind, but its practical applications and the detailed steps, methodology, and trade-off analysis required to achieve its real-time performance are not fully explored, leaving these critical and usually non-trivial issues for those wishing to employ the algorithm in a real-time system. The Journal of Real-Time Image Processing is intended to bridge the gap between the theory and practice of image processing, serving the greater community of researchers, practicing engineers, and industrial professionals who deal with designing, implementing or utilizing image processing systems which must satisfy real-time design constraints.