{"title":"Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer","authors":"Zefei Chen;Yongjie Lin;Jianmin Xu;Kai Lu;Yanfang Shou","doi":"10.1109/LSP.2024.3525397","DOIUrl":null,"url":null,"abstract":"Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP<inline-formula><tex-math>$_{m}$</tex-math></inline-formula> from 53.02 to 53.87. Furthermore, the mMR<inline-formula><tex-math>$^{-2}$</tex-math></inline-formula> decreased from 52.46 to 42.32 compared to the existing BFJ.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"576-580"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10820533/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP$_{m}$ from 53.02 to 53.87. Furthermore, the mMR$^{-2}$ decreased from 52.46 to 42.32 compared to the existing BFJ.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.