Wanru Song, Xinyi Wang, Weimin Wu, Yuan Zhang, Feng Liu
{"title":"用于可见光-红外线人员再识别的通道增强型跨模态关系网络","authors":"Wanru Song, Xinyi Wang, Weimin Wu, Yuan Zhang, Feng Liu","doi":"10.1007/s10489-024-06057-x","DOIUrl":null,"url":null,"abstract":"<div><p>Visible-infrared person re-identification (VI Re-ID) is designed to perform pedestrian retrieval on non-overlapping visible-infrared cameras, and it is widely employed in intelligent surveillance. For the VI Re-ID task, one of the main challenges is the huge modality discrepancy between the visible and infrared images. Therefore, mining more shared features in the cross-modality task turns into an important issue. To address this problem, this paper proposes a novel framework for feature learning and feature embedding in VI Re-ID, namely Channel Enhanced Cross-modality Relation Network (CECR-Net). More specifically, the network contains three key modules. In the first module, to shorten the distance between the original modalities, a channel selection operation is applied to the visible images, the robustness against color variations is improved by randomly generating three-channel R/G/B images. The module also exploits the low- and mid-level information of the visible and auxiliary modal images through a feature parameter-sharing strategy. Considering that the body sequences of pedestrians are not easy to change with modality, CECR-Net designs two modules based on relation network for VI Re-ID, namely the intra-relation learning and the cross-relation learning modules. These two modules help to capture the structural relationship between body parts, which is a modality-invariant information, disrupting the isolation between local features. Extensive experiments on the two public benchmarks indicate that CECR-Net is superior compared to the state-of-the-art methods. In particular, for the SYSU-MM01 dataset, the Rank1 and mAP reach 76.83% and 71.56% in the \"All Search\" mode, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Channel enhanced cross-modality relation network for visible-infrared person re-identification\",\"authors\":\"Wanru Song, Xinyi Wang, Weimin Wu, Yuan Zhang, Feng Liu\",\"doi\":\"10.1007/s10489-024-06057-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Visible-infrared person re-identification (VI Re-ID) is designed to perform pedestrian retrieval on non-overlapping visible-infrared cameras, and it is widely employed in intelligent surveillance. For the VI Re-ID task, one of the main challenges is the huge modality discrepancy between the visible and infrared images. Therefore, mining more shared features in the cross-modality task turns into an important issue. To address this problem, this paper proposes a novel framework for feature learning and feature embedding in VI Re-ID, namely Channel Enhanced Cross-modality Relation Network (CECR-Net). More specifically, the network contains three key modules. In the first module, to shorten the distance between the original modalities, a channel selection operation is applied to the visible images, the robustness against color variations is improved by randomly generating three-channel R/G/B images. The module also exploits the low- and mid-level information of the visible and auxiliary modal images through a feature parameter-sharing strategy. Considering that the body sequences of pedestrians are not easy to change with modality, CECR-Net designs two modules based on relation network for VI Re-ID, namely the intra-relation learning and the cross-relation learning modules. These two modules help to capture the structural relationship between body parts, which is a modality-invariant information, disrupting the isolation between local features. Extensive experiments on the two public benchmarks indicate that CECR-Net is superior compared to the state-of-the-art methods. In particular, for the SYSU-MM01 dataset, the Rank1 and mAP reach 76.83% and 71.56% in the \\\"All Search\\\" mode, respectively.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-06057-x\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-06057-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
可见光-红外人员再识别(VI Re-ID)的设计目的是在非重叠的可见光-红外摄像机上执行行人检索,它被广泛应用于智能监控领域。对于 VI Re-ID 任务来说,主要挑战之一是可见光和红外图像之间巨大的模态差异。因此,在跨模态任务中挖掘更多共享特征成为一个重要问题。为解决这一问题,本文提出了一种新颖的 VI Re-ID 特征学习和特征嵌入框架,即通道增强跨模态关系网络(CECR-Net)。具体来说,该网络包含三个关键模块。在第一个模块中,为了缩短原始模态之间的距离,对可见光图像进行了通道选择操作,并通过随机生成 R/G/B 三通道图像提高了对颜色变化的鲁棒性。该模块还通过特征参数共享策略利用了可见光图像和辅助模态图像的中低层信息。考虑到行人的身体序列不易随模态变化,CECR-Net 设计了两个基于关系网络的 VI Re-ID 模块,即内部关系学习模块和交叉关系学习模块。这两个模块有助于捕捉身体部位之间的结构关系,这是一种模态不变的信息,打破了局部特征之间的孤立性。在两个公共基准上进行的大量实验表明,CECR-Net 优于最先进的方法。特别是在 SYSU-MM01 数据集上,在 "全部搜索 "模式下,Rank1 和 mAP 分别达到了 76.83% 和 71.56%。
Channel enhanced cross-modality relation network for visible-infrared person re-identification
Visible-infrared person re-identification (VI Re-ID) is designed to perform pedestrian retrieval on non-overlapping visible-infrared cameras, and it is widely employed in intelligent surveillance. For the VI Re-ID task, one of the main challenges is the huge modality discrepancy between the visible and infrared images. Therefore, mining more shared features in the cross-modality task turns into an important issue. To address this problem, this paper proposes a novel framework for feature learning and feature embedding in VI Re-ID, namely Channel Enhanced Cross-modality Relation Network (CECR-Net). More specifically, the network contains three key modules. In the first module, to shorten the distance between the original modalities, a channel selection operation is applied to the visible images, the robustness against color variations is improved by randomly generating three-channel R/G/B images. The module also exploits the low- and mid-level information of the visible and auxiliary modal images through a feature parameter-sharing strategy. Considering that the body sequences of pedestrians are not easy to change with modality, CECR-Net designs two modules based on relation network for VI Re-ID, namely the intra-relation learning and the cross-relation learning modules. These two modules help to capture the structural relationship between body parts, which is a modality-invariant information, disrupting the isolation between local features. Extensive experiments on the two public benchmarks indicate that CECR-Net is superior compared to the state-of-the-art methods. In particular, for the SYSU-MM01 dataset, the Rank1 and mAP reach 76.83% and 71.56% in the "All Search" mode, respectively.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.