学习可扩展的全尺度分布人群计数

IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Visual Communication and Image Representation Pub Date : 2025-03-01 Epub Date: 2025-01-18 DOI:10.1016/j.jvcir.2025.104387
Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian
{"title":"学习可扩展的全尺度分布人群计数","authors":"Huake Wang ,&nbsp;Xingsong Hou ,&nbsp;Kaibing Zhang ,&nbsp;Xin Zeng ,&nbsp;Minqi Li ,&nbsp;Wenke Sun ,&nbsp;Xueming Qian","doi":"10.1016/j.jvcir.2025.104387","DOIUrl":null,"url":null,"abstract":"<div><div>Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104387"},"PeriodicalIF":3.1000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning scalable Omni-scale distribution for crowd counting\",\"authors\":\"Huake Wang ,&nbsp;Xingsong Hou ,&nbsp;Kaibing Zhang ,&nbsp;Xin Zeng ,&nbsp;Minqi Li ,&nbsp;Wenke Sun ,&nbsp;Xueming Qian\",\"doi\":\"10.1016/j.jvcir.2025.104387\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"107 \",\"pages\":\"Article 104387\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S104732032500001X\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S104732032500001X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

在不受控制的场景中,个体的巨大外观变化对人群计数提出了挑战。许多以前的方法通过学习多尺度特征并将它们连接在一起来阐述这个问题,以获得更令人印象深刻的性能。然而,这种幼稚的融合是直观的,对于大范围的尺度变化来说不够理想。本文提出了一种新的特征融合方案——可扩展全尺度分布融合(SODF),该方案利用多层特征映射中不同尺度分布的优势来近似目标尺度的真实分布。受高斯混合模型的启发,从概率角度超越了多尺度特征融合,我们的SODF模块自适应集成多层特征映射,而不嵌入任何多尺度结构。SODF模块由两个主要部分组成:感知真实分布的交互块和为多层或多列特征映射分配权重的分配块。新提出的SODF模块具有可扩展、轻量级和即插即用的特点,可以灵活地嵌入到其他计数网络中。此外,我们还设计了一个具有SODF模块和多层结构的计数模型(SODF- net)。在四个基准数据集上进行的大量实验表明,所提出的SODF-Net与最先进的计数模型相比表现良好。此外,所提出的SODF模块可以有效地提高MCNN、CSRNet和can等规范计数网络的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learning scalable Omni-scale distribution for crowd counting
Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Visual Communication and Image Representation
Journal of Visual Communication and Image Representation 工程技术-计算机:软件工程
CiteScore
5.40
自引率
11.50%
发文量
188
审稿时长
9.9 months
期刊介绍: The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.
期刊最新文献
Gradually Vanishing Gap in Prototypical Network for unsupervised domain adaptation GIFT: A Green Image Forgery Tracker MLHFA: Multi-local high-frequency attention and feature aggregation for generalizable deepfake detection Leveraging ISTA-inspired Transformer and multi-scale CNN for image compressive sensing Multi-scale dense network with pyramid pooling for glaucoma screening
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1