{"title":"Adaptive Multi-Task Learning for Multi-PAR in Real World","authors":"Haoyun Sun;Hongwei Zhao;Weishan Zhang;Liang Xu;Hongqing Guan","doi":"10.1109/JRFID.2024.3371881","DOIUrl":null,"url":null,"abstract":"Multi-pedestrian attribute recognition (Multi-PAR) is a vital task for smart city surveillance applications, which requires identifying various attributes of multiple pedestrians in a single image. However, most existing methods are limited by the complex backgrounds and the time-consuming pedestrian detection preprocessing work in real-world scenarios, and cannot achieve satisfactory accuracy and efficiency. In this paper, we present a novel end-to-end solution, named Adaptive Multi-Task Network (AMTN), which jointly performs multiple tasks and leverages an adaptive feature re-extraction (AFRE) module to optimize them. Specially, We integrate pedestrian detection into AMTN to perform PAR preprocessing, and incorporate a person re-identification (ReID) task branch to track pedestrians in video streams, thereby selecting the clearest video frames for analysis instead of every video frame to improve analysis efficiency and recognition accuracy. Moreover, we design a dynamic weight fitting loss (DWFL) function to prevent gradient explosions and balance tasks during training. We conduct extensive experiments to evaluate the accuracy and efficiency of our approach, and compare it with the state-of-the-art methods. The experimental results demonstrate that our method outperforms other state-of-the-art algorithms, achieving 1.5%-4.9% improvement in accuracy on Multi-PAR. The experiments also show that the AMTN can greatly improve the efficiency of preprocessing by saving the computation of feature extraction through basic features sharing. Compared with the state-of-the-art detection algorithm Yolov5s, it can improve the efficiency by 42%.","PeriodicalId":73291,"journal":{"name":"IEEE journal of radio frequency identification","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal of radio frequency identification","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10454582/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-pedestrian attribute recognition (Multi-PAR) is a vital task for smart city surveillance applications, which requires identifying various attributes of multiple pedestrians in a single image. However, most existing methods are limited by the complex backgrounds and the time-consuming pedestrian detection preprocessing work in real-world scenarios, and cannot achieve satisfactory accuracy and efficiency. In this paper, we present a novel end-to-end solution, named Adaptive Multi-Task Network (AMTN), which jointly performs multiple tasks and leverages an adaptive feature re-extraction (AFRE) module to optimize them. Specially, We integrate pedestrian detection into AMTN to perform PAR preprocessing, and incorporate a person re-identification (ReID) task branch to track pedestrians in video streams, thereby selecting the clearest video frames for analysis instead of every video frame to improve analysis efficiency and recognition accuracy. Moreover, we design a dynamic weight fitting loss (DWFL) function to prevent gradient explosions and balance tasks during training. We conduct extensive experiments to evaluate the accuracy and efficiency of our approach, and compare it with the state-of-the-art methods. The experimental results demonstrate that our method outperforms other state-of-the-art algorithms, achieving 1.5%-4.9% improvement in accuracy on Multi-PAR. The experiments also show that the AMTN can greatly improve the efficiency of preprocessing by saving the computation of feature extraction through basic features sharing. Compared with the state-of-the-art detection algorithm Yolov5s, it can improve the efficiency by 42%.