{"title":"MSCMNet: Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification","authors":"","doi":"10.1016/j.patcog.2024.111090","DOIUrl":null,"url":null,"abstract":"<div><div>The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID) task lies in extracting discriminative features from different modalities for matching purposes. While existing studies primarily focus on reducing modal discrepancies, the modality information fails to be thoroughly exploited. To solve this problem, the Multi-scale Semantic Correlation Mining network (MSCMNet) is proposed to comprehensively exploit semantic features at multiple scales. The network fuses shallow-level features into the deep network through dimensionality reduction and mapping, and the fused features are utilized to minimize modality information loss in feature extraction. Firstly, considering the effective utilization of modality information, the Multi-scale Information Correlation Mining Block (MIMB) is designed to fuse features at different scales and explore the semantic correlation of fusion features. Secondly, in order to enrich the semantic information that MIMB can utilize, the Quadruple-stream Feature Extractor (QFE) with non-shared parameters is specifically designed to extract information from different dimensions of the dataset. Finally, the Quadruple Center Triplet Loss (QCT) is further proposed to address the information discrepancy in the comprehensive features. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the proposed MSCMNet achieves the greatest accuracy. We release the source code on <span><span>https://github.com/Hua-XC/MSCMNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008410","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID) task lies in extracting discriminative features from different modalities for matching purposes. While existing studies primarily focus on reducing modal discrepancies, the modality information fails to be thoroughly exploited. To solve this problem, the Multi-scale Semantic Correlation Mining network (MSCMNet) is proposed to comprehensively exploit semantic features at multiple scales. The network fuses shallow-level features into the deep network through dimensionality reduction and mapping, and the fused features are utilized to minimize modality information loss in feature extraction. Firstly, considering the effective utilization of modality information, the Multi-scale Information Correlation Mining Block (MIMB) is designed to fuse features at different scales and explore the semantic correlation of fusion features. Secondly, in order to enrich the semantic information that MIMB can utilize, the Quadruple-stream Feature Extractor (QFE) with non-shared parameters is specifically designed to extract information from different dimensions of the dataset. Finally, the Quadruple Center Triplet Loss (QCT) is further proposed to address the information discrepancy in the comprehensive features. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the proposed MSCMNet achieves the greatest accuracy. We release the source code on https://github.com/Hua-XC/MSCMNet.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.