Learning Crowd Scale and Distribution for Weakly Supervised Crowd Counting and Localization

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-09-13 DOI:10.1109/TCSVT.2024.3460482
Yaowu Fan;Jia Wan;Andy J. Ma
{"title":"Learning Crowd Scale and Distribution for Weakly Supervised Crowd Counting and Localization","authors":"Yaowu Fan;Jia Wan;Andy J. Ma","doi":"10.1109/TCSVT.2024.3460482","DOIUrl":null,"url":null,"abstract":"The count supervision used in weakly-supervised crowd counting is derived from the number of point annotations, which means that the labeling cost is not effectively reduced. Moreover, due to the lack of spatial information about the pedestrians during training, previous works struggle to accurately learn the positions of individuals. To address these challenges, we propose a crowd counting and localization method based on scene-specific synthetic data for surveillance scenarios, which can accurately predict the number and location of person without any manually labeled point-wise or count-wise annotations. Our method dynamically adjust scene-specific synthetic data to minimize domain differences from surveillance scenes by learning the crowd scale and distribution. Specifically, based on realistic synthetic data, the models learn precise location and scale information, which can then regenerate new synthetic data with a more reasonable pedestrian distribution and scale and generate high-quality pseudo point-wise annotations. Subsequently, the counter is trained using our proposed robust soft-weighted loss function, under the joint supervision of auto-generated point-wise annotations on synthetic data and pseudo point-wise annotations on real data in an end-to-end manner. Our proposed loss function, based on the designed weighted optimal transport, effectively mitigates noise in pseudo point-wise labels and is not only insensitive to hyperparemeters but also exhibits superior generalization ability on real data. We conduct comprehensive experiments across multiple scene-specific datasets, demonstrating our method’s superiority in counting and localization performance over count-supervised, fully-supervised, and state-of-the-art domain adaption algorithms. Code is available at <uri>https://github.com/fyw1999/LCSD</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"713-727"},"PeriodicalIF":11.1000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10680129/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The count supervision used in weakly-supervised crowd counting is derived from the number of point annotations, which means that the labeling cost is not effectively reduced. Moreover, due to the lack of spatial information about the pedestrians during training, previous works struggle to accurately learn the positions of individuals. To address these challenges, we propose a crowd counting and localization method based on scene-specific synthetic data for surveillance scenarios, which can accurately predict the number and location of person without any manually labeled point-wise or count-wise annotations. Our method dynamically adjust scene-specific synthetic data to minimize domain differences from surveillance scenes by learning the crowd scale and distribution. Specifically, based on realistic synthetic data, the models learn precise location and scale information, which can then regenerate new synthetic data with a more reasonable pedestrian distribution and scale and generate high-quality pseudo point-wise annotations. Subsequently, the counter is trained using our proposed robust soft-weighted loss function, under the joint supervision of auto-generated point-wise annotations on synthetic data and pseudo point-wise annotations on real data in an end-to-end manner. Our proposed loss function, based on the designed weighted optimal transport, effectively mitigates noise in pseudo point-wise labels and is not only insensitive to hyperparemeters but also exhibits superior generalization ability on real data. We conduct comprehensive experiments across multiple scene-specific datasets, demonstrating our method’s superiority in counting and localization performance over count-supervised, fully-supervised, and state-of-the-art domain adaption algorithms. Code is available at https://github.com/fyw1999/LCSD.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
学习人群规模和分布,实现弱监督人群计数和定位
弱监督人群计数中使用的计数监督来源于点注释的数量,这意味着不能有效降低标注成本。此外,由于在训练过程中缺乏行人的空间信息,以往的作品很难准确地学习到个体的位置。为了解决这些挑战,我们提出了一种基于场景特定合成数据的人群计数和定位方法,该方法可以准确预测人员的数量和位置,而无需手动标记点或计数注释。该方法通过学习人群的规模和分布,动态调整特定场景的合成数据,以最小化与监控场景的域差异。具体而言,该模型基于真实的合成数据,学习精确的位置和尺度信息,从而生成具有更合理行人分布和尺度的新合成数据,并生成高质量的伪逐点注释。随后,在合成数据上自动生成的逐点注释和真实数据上的伪逐点注释的端到端联合监督下,使用我们提出的鲁棒软加权损失函数对计数器进行训练。我们提出的损失函数,基于设计的加权最优传输,有效地减轻了伪点标记中的噪声,不仅对超参数不敏感,而且对实际数据表现出优异的泛化能力。我们在多个场景特定数据集上进行了全面的实验,证明了我们的方法在计数和定位性能上优于计数监督、完全监督和最先进的领域自适应算法。代码可从https://github.com/fyw1999/LCSD获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
期刊最新文献
IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information 2025 Index IEEE Transactions on Circuits and Systems for Video Technology IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1