Scale Adaptive Enhance Network for Crowd Counting

2022 11th International Conference on Educational and Information Technology (ICEIT) Pub Date : 2022-01-06 DOI:10.1109/ICEIT54416.2022.9690718

Zirui Fan, Jun Ruan

{"title":"Scale Adaptive Enhance Network for Crowd Counting","authors":"Zirui Fan, Jun Ruan","doi":"10.1109/ICEIT54416.2022.9690718","DOIUrl":null,"url":null,"abstract":"Crowd counting is a fundamental computer vision task and plays a critical role in video structure analysis and potential down-stream applications, e.g., accident forecasting and urban traffic analysis. The main challenges of crowd counting lie in the scale variation caused by disorderly distributed “person-camera” distances, as well as the interference of complex backgrounds. To address these issues, we propose a scale adaptive enhance network (SAENet) based on the encoder-decoder U-Net architecture. We employ Res2Net as the encoder backbone for extracting multi-scale head information to relieve the scale variation problem. The decoder consists of two branches, i.e., Attention Estimation Network (AENet) to provide attention maps and Density Estimation Network (DENet) to generate density maps. In order to fully leverage the complementary concepts between AENet and DENet, we craft to propose two modules to enhance feature transfer: i) a lightweight plug-and-play interactive attention module (IA-block) is deployed to multiple levels of the decoder to refine the feature map; ii) we propose a global scale adaptive fusion strategy (GSAFS) to adaptively model diverse scale cues to obtain the weighted density map. Extensive experiments show that the proposed method outperforms the existing competitive method and establishes the state-of-the-art results on ShanghaiTech Part A and B, and UCF-QNRF. Our model can achieve 53.56 and 5.95 MAE in ShanghaiTech Part A and B, with obtains performance improvement of 6.0 % and 13.13%, respectively.","PeriodicalId":285571,"journal":{"name":"2022 11th International Conference on Educational and Information Technology (ICEIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference on Educational and Information Technology (ICEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIT54416.2022.9690718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Crowd counting is a fundamental computer vision task and plays a critical role in video structure analysis and potential down-stream applications, e.g., accident forecasting and urban traffic analysis. The main challenges of crowd counting lie in the scale variation caused by disorderly distributed “person-camera” distances, as well as the interference of complex backgrounds. To address these issues, we propose a scale adaptive enhance network (SAENet) based on the encoder-decoder U-Net architecture. We employ Res2Net as the encoder backbone for extracting multi-scale head information to relieve the scale variation problem. The decoder consists of two branches, i.e., Attention Estimation Network (AENet) to provide attention maps and Density Estimation Network (DENet) to generate density maps. In order to fully leverage the complementary concepts between AENet and DENet, we craft to propose two modules to enhance feature transfer: i) a lightweight plug-and-play interactive attention module (IA-block) is deployed to multiple levels of the decoder to refine the feature map; ii) we propose a global scale adaptive fusion strategy (GSAFS) to adaptively model diverse scale cues to obtain the weighted density map. Extensive experiments show that the proposed method outperforms the existing competitive method and establishes the state-of-the-art results on ShanghaiTech Part A and B, and UCF-QNRF. Our model can achieve 53.56 and 5.95 MAE in ShanghaiTech Part A and B, with obtains performance improvement of 6.0 % and 13.13%, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人群计数的尺度自适应增强网络

人群计数是一项基本的计算机视觉任务，在视频结构分析和潜在的下游应用中起着至关重要的作用，例如事故预测和城市交通分析。人群计数的主要挑战在于“人-相机”距离无序分布造成的尺度变化，以及复杂背景的干扰。为了解决这些问题，我们提出了一种基于U-Net结构的规模自适应增强网络(SAENet)。我们采用Res2Net作为编码器主干来提取多尺度头部信息，以缓解尺度变化问题。该解码器由两个分支组成，即提供注意图的注意力估计网络(AENet)和生成密度图的密度估计网络(DENet)。为了充分利用AENet和DENet之间的互补概念，我们提出了两个模块来增强特征转移:i)将一个轻量级的即插即用交互注意模块(ia块)部署到解码器的多个级别，以细化特征映射;ii)提出了一种全局尺度自适应融合策略(GSAFS)，对不同尺度线索进行自适应建模，获得加权密度图。大量的实验表明，该方法优于现有的竞争方法，并在上海科技A、B部分和UCF-QNRF上建立了最先进的结果。该模型在上海科技A部和B部的MAE分别达到53.56和5.95，性能分别提高了6.0%和13.13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 11th International Conference on Educational and Information Technology (ICEIT)

自引率

0.00%

发文量