Scale Adaptive Enhance Network for Crowd Counting

Zirui Fan, Jun Ruan
{"title":"Scale Adaptive Enhance Network for Crowd Counting","authors":"Zirui Fan, Jun Ruan","doi":"10.1109/ICEIT54416.2022.9690718","DOIUrl":null,"url":null,"abstract":"Crowd counting is a fundamental computer vision task and plays a critical role in video structure analysis and potential down-stream applications, e.g., accident forecasting and urban traffic analysis. The main challenges of crowd counting lie in the scale variation caused by disorderly distributed “person-camera” distances, as well as the interference of complex backgrounds. To address these issues, we propose a scale adaptive enhance network (SAENet) based on the encoder-decoder U-Net architecture. We employ Res2Net as the encoder backbone for extracting multi-scale head information to relieve the scale variation problem. The decoder consists of two branches, i.e., Attention Estimation Network (AENet) to provide attention maps and Density Estimation Network (DENet) to generate density maps. In order to fully leverage the complementary concepts between AENet and DENet, we craft to propose two modules to enhance feature transfer: i) a lightweight plug-and-play interactive attention module (IA-block) is deployed to multiple levels of the decoder to refine the feature map; ii) we propose a global scale adaptive fusion strategy (GSAFS) to adaptively model diverse scale cues to obtain the weighted density map. Extensive experiments show that the proposed method outperforms the existing competitive method and establishes the state-of-the-art results on ShanghaiTech Part A and B, and UCF-QNRF. Our model can achieve 53.56 and 5.95 MAE in ShanghaiTech Part A and B, with obtains performance improvement of 6.0 % and 13.13%, respectively.","PeriodicalId":285571,"journal":{"name":"2022 11th International Conference on Educational and Information Technology (ICEIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference on Educational and Information Technology (ICEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIT54416.2022.9690718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Crowd counting is a fundamental computer vision task and plays a critical role in video structure analysis and potential down-stream applications, e.g., accident forecasting and urban traffic analysis. The main challenges of crowd counting lie in the scale variation caused by disorderly distributed “person-camera” distances, as well as the interference of complex backgrounds. To address these issues, we propose a scale adaptive enhance network (SAENet) based on the encoder-decoder U-Net architecture. We employ Res2Net as the encoder backbone for extracting multi-scale head information to relieve the scale variation problem. The decoder consists of two branches, i.e., Attention Estimation Network (AENet) to provide attention maps and Density Estimation Network (DENet) to generate density maps. In order to fully leverage the complementary concepts between AENet and DENet, we craft to propose two modules to enhance feature transfer: i) a lightweight plug-and-play interactive attention module (IA-block) is deployed to multiple levels of the decoder to refine the feature map; ii) we propose a global scale adaptive fusion strategy (GSAFS) to adaptively model diverse scale cues to obtain the weighted density map. Extensive experiments show that the proposed method outperforms the existing competitive method and establishes the state-of-the-art results on ShanghaiTech Part A and B, and UCF-QNRF. Our model can achieve 53.56 and 5.95 MAE in ShanghaiTech Part A and B, with obtains performance improvement of 6.0 % and 13.13%, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人群计数的尺度自适应增强网络
人群计数是一项基本的计算机视觉任务,在视频结构分析和潜在的下游应用中起着至关重要的作用,例如事故预测和城市交通分析。人群计数的主要挑战在于“人-相机”距离无序分布造成的尺度变化,以及复杂背景的干扰。为了解决这些问题,我们提出了一种基于U-Net结构的规模自适应增强网络(SAENet)。我们采用Res2Net作为编码器主干来提取多尺度头部信息,以缓解尺度变化问题。该解码器由两个分支组成,即提供注意图的注意力估计网络(AENet)和生成密度图的密度估计网络(DENet)。为了充分利用AENet和DENet之间的互补概念,我们提出了两个模块来增强特征转移:i)将一个轻量级的即插即用交互注意模块(ia块)部署到解码器的多个级别,以细化特征映射;ii)提出了一种全局尺度自适应融合策略(GSAFS),对不同尺度线索进行自适应建模,获得加权密度图。大量的实验表明,该方法优于现有的竞争方法,并在上海科技A、B部分和UCF-QNRF上建立了最先进的结果。该模型在上海科技A部和B部的MAE分别达到53.56和5.95,性能分别提高了6.0%和13.13%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Personalized Recommendation of Learning Resources Based on Knowledge Graph An Microservices-Based OpenStack Monitoring System Benefits, Challenges and Solutions of Artificial Intelligence Applied in Education T.A.L.A Goal Setting Life Skills Learning Approach on the Meta-Empirical Competence and Academic Performance of Diverse Learners A Corpus-Based Sampling to Build Training Data Set for Extracting Japanese Sentence Pattern
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1