CMPNet：用于 RGB-T 人群计数的跨模态多尺度感知网络

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-06 DOI:10.1016/j.future.2024.107596

Shihui Zhang , Kun Chen , Gangzheng Zhai , He Li , Shaojie Han

{"title":"CMPNet：用于 RGB-T 人群计数的跨模态多尺度感知网络","authors":"Shihui Zhang , Kun Chen , Gangzheng Zhai , He Li , Shaojie Han","doi":"10.1016/j.future.2024.107596","DOIUrl":null,"url":null,"abstract":"<div><div>The cross-modal crowd counting method demonstrates better scene adaptability under complex conditions by introducing independent supplementary information. However, existing methods still face problems such as insufficient fusion of modal features, underutilization of crowd structure, and the neglect of scale information. In response to the above issues, this paper proposes a cross-modal multi-scale perception network (CMPNet). Specifically, CMPNet mainly consists of a cross-modal perception fusion module and a multi-scale feature aggregation module. The cross-modal perception fusion module effectively suppresses noise features while sharing features between different modalities, thereby significantly improving the robustness of the crowd counting process. The multi-scale feature aggregation module obtains rich crowd structure information through a spatial context aware graph convolution unit, and then integrates feature information from different scales to enhance the network’s perception ability of crowd density. To the best of our knowledge, CMPNet is the first attempt to model the crowd structure and mine its semantics in the field of cross-modal crowd counting. The experimental results show that CMPNet achieves state-of-the-art performance on all RGB-T datasets, providing an effective solution for cross-modal crowd counting. We will release the code at <span><span>https://github.com/KunChenKKK/CMPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107596"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CMPNet: A cross-modal multi-scale perception network for RGB-T crowd counting\",\"authors\":\"Shihui Zhang , Kun Chen , Gangzheng Zhai , He Li , Shaojie Han\",\"doi\":\"10.1016/j.future.2024.107596\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The cross-modal crowd counting method demonstrates better scene adaptability under complex conditions by introducing independent supplementary information. However, existing methods still face problems such as insufficient fusion of modal features, underutilization of crowd structure, and the neglect of scale information. In response to the above issues, this paper proposes a cross-modal multi-scale perception network (CMPNet). Specifically, CMPNet mainly consists of a cross-modal perception fusion module and a multi-scale feature aggregation module. The cross-modal perception fusion module effectively suppresses noise features while sharing features between different modalities, thereby significantly improving the robustness of the crowd counting process. The multi-scale feature aggregation module obtains rich crowd structure information through a spatial context aware graph convolution unit, and then integrates feature information from different scales to enhance the network’s perception ability of crowd density. To the best of our knowledge, CMPNet is the first attempt to model the crowd structure and mine its semantics in the field of cross-modal crowd counting. The experimental results show that CMPNet achieves state-of-the-art performance on all RGB-T datasets, providing an effective solution for cross-modal crowd counting. We will release the code at <span><span>https://github.com/KunChenKKK/CMPNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"164 \",\"pages\":\"Article 107596\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X24005600\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24005600","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

跨模态人群计数方法通过引入独立的补充信息，在复杂条件下表现出更好的场景适应性。然而，现有方法仍面临模态特征融合不足、人群结构利用不足、尺度信息被忽视等问题。针对上述问题，本文提出了一种跨模态多尺度感知网络（CMPNet）。具体来说，CMPNet 主要由跨模态感知融合模块和多尺度特征聚合模块组成。跨模态感知融合模块可有效抑制噪声特征，同时共享不同模态之间的特征，从而显著提高人群计数过程的鲁棒性。多尺度特征聚合模块通过空间上下文感知图卷积单元获取丰富的人群结构信息，然后整合不同尺度的特征信息，增强网络对人群密度的感知能力。据我们所知，CMPNet 是在跨模态人群计数领域首次尝试建立人群结构模型并挖掘其语义。实验结果表明，CMPNet 在所有 RGB-T 数据集上都达到了最先进的性能，为跨模态人群统计提供了有效的解决方案。我们将在 https://github.com/KunChenKKK/CMPNet 发布代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CMPNet: A cross-modal multi-scale perception network for RGB-T crowd counting

The cross-modal crowd counting method demonstrates better scene adaptability under complex conditions by introducing independent supplementary information. However, existing methods still face problems such as insufficient fusion of modal features, underutilization of crowd structure, and the neglect of scale information. In response to the above issues, this paper proposes a cross-modal multi-scale perception network (CMPNet). Specifically, CMPNet mainly consists of a cross-modal perception fusion module and a multi-scale feature aggregation module. The cross-modal perception fusion module effectively suppresses noise features while sharing features between different modalities, thereby significantly improving the robustness of the crowd counting process. The multi-scale feature aggregation module obtains rich crowd structure information through a spatial context aware graph convolution unit, and then integrates feature information from different scales to enhance the network’s perception ability of crowd density. To the best of our knowledge, CMPNet is the first attempt to model the crowd structure and mine its semantics in the field of cross-modal crowd counting. The experimental results show that CMPNet achieves state-of-the-art performance on all RGB-T datasets, providing an effective solution for cross-modal crowd counting. We will release the code at https://github.com/KunChenKKK/CMPNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.

期刊最新文献

Identifying runtime libraries in statically linked linux binaries High throughput edit distance computation on FPGA-based accelerators using HLS In silico framework for genome analysis Adaptive ensemble optimization for memory-related hyperparameters in retraining DNN at edge Convergence-aware optimal checkpointing for exploratory deep learning training jobs