Di Wu , Zhihui Liu , Zihan Chen , Shenglong Gan , Kaiwen Tan , Qin Wan , Yaonan Wang
{"title":"LRMM:基于 RGB-NI-TI 的低等级多尺度多模态融合技术用于人员再识别","authors":"Di Wu , Zhihui Liu , Zihan Chen , Shenglong Gan , Kaiwen Tan , Qin Wan , Yaonan Wang","doi":"10.1016/j.eswa.2024.125716","DOIUrl":null,"url":null,"abstract":"<div><div>Person Re-identification is a crucial task in video surveillance, aiming to match person images from non-overlapping camera views. Recent methods introduce the Near-Infrared (NI) modality to alleviate the limitations of traditional single visible light modality under low-light conditions, while they overlook the importance of modality-related information. To incorporate more additional complementary information to assist traditional person re-identification tasks, in this paper, a novel RGB-NI-TI multi-modal person re-identification approach is proposed. First, we design a multi-scale multi-modal interaction module to facilitate cross-modal information fusion across multiple scales. Secondly, we propose a low-rank multi-modal fusion module that leverages the feature and weight parallel decomposition and then employs low-rank modality-specific factors for multimodal fusion. It aims to make the model more efficient in fusing multiple modal features while reducing complexity. Finally, we propose a multiple modalities prototype loss to supervise the network jointly with the cross-entropy loss, enforcing the network to learn modality-specific information by improving the intra-class cross-modality similarity and expanding the inter-class difference. The experimental results on benchmark multi-modal Re-ID datasets (RGBNT201, RGBNT100, MSVR310) and constructed person Re-ID datasets (multimodal version Market1501, PRW) validate the effectiveness of the proposed approach compared with the state-of-the-art methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125716"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TI\",\"authors\":\"Di Wu , Zhihui Liu , Zihan Chen , Shenglong Gan , Kaiwen Tan , Qin Wan , Yaonan Wang\",\"doi\":\"10.1016/j.eswa.2024.125716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Person Re-identification is a crucial task in video surveillance, aiming to match person images from non-overlapping camera views. Recent methods introduce the Near-Infrared (NI) modality to alleviate the limitations of traditional single visible light modality under low-light conditions, while they overlook the importance of modality-related information. To incorporate more additional complementary information to assist traditional person re-identification tasks, in this paper, a novel RGB-NI-TI multi-modal person re-identification approach is proposed. First, we design a multi-scale multi-modal interaction module to facilitate cross-modal information fusion across multiple scales. Secondly, we propose a low-rank multi-modal fusion module that leverages the feature and weight parallel decomposition and then employs low-rank modality-specific factors for multimodal fusion. It aims to make the model more efficient in fusing multiple modal features while reducing complexity. Finally, we propose a multiple modalities prototype loss to supervise the network jointly with the cross-entropy loss, enforcing the network to learn modality-specific information by improving the intra-class cross-modality similarity and expanding the inter-class difference. The experimental results on benchmark multi-modal Re-ID datasets (RGBNT201, RGBNT100, MSVR310) and constructed person Re-ID datasets (multimodal version Market1501, PRW) validate the effectiveness of the proposed approach compared with the state-of-the-art methods.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"263 \",\"pages\":\"Article 125716\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424025831\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025831","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TI
Person Re-identification is a crucial task in video surveillance, aiming to match person images from non-overlapping camera views. Recent methods introduce the Near-Infrared (NI) modality to alleviate the limitations of traditional single visible light modality under low-light conditions, while they overlook the importance of modality-related information. To incorporate more additional complementary information to assist traditional person re-identification tasks, in this paper, a novel RGB-NI-TI multi-modal person re-identification approach is proposed. First, we design a multi-scale multi-modal interaction module to facilitate cross-modal information fusion across multiple scales. Secondly, we propose a low-rank multi-modal fusion module that leverages the feature and weight parallel decomposition and then employs low-rank modality-specific factors for multimodal fusion. It aims to make the model more efficient in fusing multiple modal features while reducing complexity. Finally, we propose a multiple modalities prototype loss to supervise the network jointly with the cross-entropy loss, enforcing the network to learn modality-specific information by improving the intra-class cross-modality similarity and expanding the inter-class difference. The experimental results on benchmark multi-modal Re-ID datasets (RGBNT201, RGBNT100, MSVR310) and constructed person Re-ID datasets (multimodal version Market1501, PRW) validate the effectiveness of the proposed approach compared with the state-of-the-art methods.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.