{"title":"Towards Accurate Crowd Counting Via Smoothed Dilated Convolutions and Transformer","authors":"Xin Zeng, Huake Wang, Gaoyi Zhu, Yunpeng Wu","doi":"10.1109/CCAI57533.2023.10201260","DOIUrl":null,"url":null,"abstract":"Density-based methods have shown promising results on crowd counting. Many existing methods seek to extract multi-scale features by dilated convolutions, but always gridding artifacts plague dilated convolutions. In this work, we propose to solve the gridding artifacts via smooth dilated residual block (SDRB). The smoothed dilation technique adds separable and shared convolutions that provide dependency among feature maps. Moreover, we present a residual contextual transformer block (RCTB) for multi-scale feature generation. The RCTB enables the location and recognition of people on the pixel level. Finally, we corroborate the prediction accuracy and the generalization capability with extensive experimental support. Our model enjoys superior performance on three realistic and public benchmarks: JHU-CROWD++, ShanghaiTech, and FDST.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"692 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Density-based methods have shown promising results on crowd counting. Many existing methods seek to extract multi-scale features by dilated convolutions, but always gridding artifacts plague dilated convolutions. In this work, we propose to solve the gridding artifacts via smooth dilated residual block (SDRB). The smoothed dilation technique adds separable and shared convolutions that provide dependency among feature maps. Moreover, we present a residual contextual transformer block (RCTB) for multi-scale feature generation. The RCTB enables the location and recognition of people on the pixel level. Finally, we corroborate the prediction accuracy and the generalization capability with extensive experimental support. Our model enjoys superior performance on three realistic and public benchmarks: JHU-CROWD++, ShanghaiTech, and FDST.