Category-Contrastive Fine-Grained Crowd Counting and Beyond

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI:10.1109/TMM.2024.3521823

Meijing Zhang;Mengxue Chen;Qi Li;Yanchen Chen;Rui Lin;Xiaolian Li;Shengfeng He;Wenxi Liu

{"title":"Category-Contrastive Fine-Grained Crowd Counting and Beyond","authors":"Meijing Zhang;Mengxue Chen;Qi Li;Yanchen Chen;Rui Lin;Xiaolian Li;Shengfeng He;Wenxi Liu","doi":"10.1109/TMM.2024.3521823","DOIUrl":null,"url":null,"abstract":"Crowd counting has drawn increasing attention across various fields. However, existing crowd counting tasks primarily focus on estimating the overall population, ignoring the behavioral and semantic information of different social groups within the crowd. In this paper, we aim to address a newly proposed research problem, namely fine-grained crowd counting, which involves identifying different categories of individuals and accurately counting them in static images. In order to fully leverage the categorical information in static crowd images, we propose a two-tier salient feature propagation module designed to sequentially extract semantic information from both the crowd and its surrounding environment. Additionally, we introduce a category difference loss to refine the feature representation by highlighting the differences between various crowd categories. Moreover, our proposed framework can adapt to a novel problem setup called few-example fine-grained crowd counting. This setup, unlike the original fine-grained crowd counting, requires only a few exemplar point annotations instead of dense annotations from predefined categories, making it applicable in a wider range of scenarios. The baseline model for this task can be established by substituting the loss function in our proposed model with a novel hybrid loss function that integrates point-oriented cross-entropy loss and category contrastive loss. Through comprehensive experiments, we present results in both the formulation and application of fine-grained crowd counting.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"477-488"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814710/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Crowd counting has drawn increasing attention across various fields. However, existing crowd counting tasks primarily focus on estimating the overall population, ignoring the behavioral and semantic information of different social groups within the crowd. In this paper, we aim to address a newly proposed research problem, namely fine-grained crowd counting, which involves identifying different categories of individuals and accurately counting them in static images. In order to fully leverage the categorical information in static crowd images, we propose a two-tier salient feature propagation module designed to sequentially extract semantic information from both the crowd and its surrounding environment. Additionally, we introduce a category difference loss to refine the feature representation by highlighting the differences between various crowd categories. Moreover, our proposed framework can adapt to a novel problem setup called few-example fine-grained crowd counting. This setup, unlike the original fine-grained crowd counting, requires only a few exemplar point annotations instead of dense annotations from predefined categories, making it applicable in a wider range of scenarios. The baseline model for this task can be established by substituting the loss function in our proposed model with a novel hybrid loss function that integrates point-oriented cross-entropy loss and category contrastive loss. Through comprehensive experiments, we present results in both the formulation and application of fine-grained crowd counting.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

类别对比细粒度人群计数及其他

人群计数在各个领域引起了越来越多的关注。然而，现有的人群计数任务主要集中在估计总体人口，忽略了人群中不同社会群体的行为和语义信息。在本文中，我们的目标是解决一个新提出的研究问题，即细粒度人群计数，它涉及识别不同类别的个体并在静态图像中准确计数。为了充分利用静态人群图像中的分类信息，我们提出了一种两层显著特征传播模块，旨在从人群及其周围环境中依次提取语义信息。此外，我们引入了类别差异损失，通过突出不同人群类别之间的差异来改进特征表示。此外，我们提出的框架可以适应一种新的问题设置，称为少示例细粒度人群计数。与最初的细粒度人群计数不同，这种设置只需要几个示例点注释，而不是来自预定义类别的密集注释，这使得它适用于更广泛的场景。该任务的基线模型可以通过将我们提出的模型中的损失函数替换为集成了面向点的交叉熵损失和类别对比损失的新型混合损失函数来建立。通过综合实验，我们给出了细粒度人群计数的公式和应用结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.