Advancing Visible-Infrared Person Re-Identification: Synergizing Visual-Textual Reasoning and Cross-Modal Feature Alignment

IF 8 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2025-02-11 DOI:10.1109/TIFS.2025.3539946
Yuxuan Qiu;Liyang Wang;Wei Song;Jiawei Liu;Zhiping Shi;Na Jiang
{"title":"Advancing Visible-Infrared Person Re-Identification: Synergizing Visual-Textual Reasoning and Cross-Modal Feature Alignment","authors":"Yuxuan Qiu;Liyang Wang;Wei Song;Jiawei Liu;Zhiping Shi;Na Jiang","doi":"10.1109/TIFS.2025.3539946","DOIUrl":null,"url":null,"abstract":"Visible-infrared person re-identification (VI-ReID) is a critical cross-modality fine-grained classification task with significant implications for public safety and security applications. Existing VI-ReID methods primarily focus on extracting modality-invariant features for person retrieval. However, due to the inherent lack of texture information in infrared images, these modality-invariant features tend to emphasize global contexts. Consequently, individuals with similar silhouettes are often misidentified, posing potential risks to security systems and forensic investigations. To address this problem, this paper innovatively introduces natural language descriptions to learn the global-local contexts for VI-ReID. Specifically, we design a framework that jointly optimizes visible-infrared alignment plus (VIAP) and visual-textual reasoning (VTR), and introduces local-global joint measure (LJM) to enhance the metric, while proposing a human-LLM collaborative approach to incorporate textual descriptions into existing cross-modal person re-identification datasets. VIAP achieves cross-modal alignment between RGB and IR. It can explicitly utilize designed frequency-aware modality alignment and relationship-reinforced fusion to explore the potential of local cues in global features and modality-invariant information. VTR proposes pooling selection and dual-level reasoning mechanisms to force the image encoder to pay attention to significant regions based on textual descriptions. LJM proposes introducing local feature distances into the measure stage metric to enhance the relevance of matching using fine-grained information. Extensive experimental results on the popular SYSU-MM01 and RegDB datasets show that the proposed method significantly outperforms state-of-the-art approaches. The dataset is publicly available at <uri>https://github.com/qyx596/vireid-caption</uri>.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"2184-2196"},"PeriodicalIF":8.0000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10879282/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Visible-infrared person re-identification (VI-ReID) is a critical cross-modality fine-grained classification task with significant implications for public safety and security applications. Existing VI-ReID methods primarily focus on extracting modality-invariant features for person retrieval. However, due to the inherent lack of texture information in infrared images, these modality-invariant features tend to emphasize global contexts. Consequently, individuals with similar silhouettes are often misidentified, posing potential risks to security systems and forensic investigations. To address this problem, this paper innovatively introduces natural language descriptions to learn the global-local contexts for VI-ReID. Specifically, we design a framework that jointly optimizes visible-infrared alignment plus (VIAP) and visual-textual reasoning (VTR), and introduces local-global joint measure (LJM) to enhance the metric, while proposing a human-LLM collaborative approach to incorporate textual descriptions into existing cross-modal person re-identification datasets. VIAP achieves cross-modal alignment between RGB and IR. It can explicitly utilize designed frequency-aware modality alignment and relationship-reinforced fusion to explore the potential of local cues in global features and modality-invariant information. VTR proposes pooling selection and dual-level reasoning mechanisms to force the image encoder to pay attention to significant regions based on textual descriptions. LJM proposes introducing local feature distances into the measure stage metric to enhance the relevance of matching using fine-grained information. Extensive experimental results on the popular SYSU-MM01 and RegDB datasets show that the proposed method significantly outperforms state-of-the-art approaches. The dataset is publicly available at https://github.com/qyx596/vireid-caption.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
推进可见-红外人物再识别:协同视觉-文本推理和跨模态特征对齐
可见红外人员再识别(VI-ReID)是一项关键的跨模态细粒度分类任务,在公共安全和安保应用中具有重要意义。现有的VI-ReID方法主要侧重于提取模态不变特征以进行人物检索。然而,由于红外图像本身缺乏纹理信息,这些模态不变特征往往强调全局背景。因此,具有相似轮廓的个体经常被错误识别,给安全系统和法医调查带来潜在风险。为了解决这一问题,本文创新性地引入自然语言描述来学习VI-ReID的全局-局部上下文。具体而言,我们设计了一个框架,该框架联合优化了可见光-红外校准加(VIAP)和视觉文本推理(VTR),并引入了局部-全局联合度量(LJM)来增强度量,同时提出了一种人类-全局联合度量的协作方法,将文本描述纳入现有的跨模态人再识别数据集。VIAP实现了RGB和IR之间的跨模态对齐。它可以明确地利用设计的频率感知模态对齐和关系增强融合来探索局部线索在全局特征和模态不变信息中的潜力。VTR提出了池化选择和双层推理机制,迫使图像编码器关注基于文本描述的重要区域。LJM提出在度量阶段度量中引入局部特征距离,利用细粒度信息增强匹配的相关性。在流行的SYSU-MM01和RegDB数据集上的大量实验结果表明,所提出的方法显着优于最先进的方法。该数据集可在https://github.com/qyx596/vireid-caption上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
期刊最新文献
ShiftHub: Enabling Atomic, Unlinkable, and Practical m ∼1 Cross-Chain Transfer OwnerHunter: Multilingual Website Owner Identification Powered by Large Language Model A Novel Perspective on Gradient Defense: Layer-Specific Protection Against Privacy Leakage Cert-SSBD: Certified Backdoor Defense With Sample-Specific Smoothing Noises GUARD: A Unified Open-Set and Closed-Set Gait Recognition Framework via Feature Reconstruction on Wi-Fi CSI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1