RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template Updating

Bo Li;Fengguang Peng;Tianrui Hui;Xiaoming Wei;Xiaolin Wei;Lijun Zhang;Hang Shi;Si Liu
{"title":"RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template Updating","authors":"Bo Li;Fengguang Peng;Tianrui Hui;Xiaoming Wei;Xiaolin Wei;Lijun Zhang;Hang Shi;Si Liu","doi":"10.1109/TPAMI.2024.3475472","DOIUrl":null,"url":null,"abstract":"The goal of RGB-Thermal (RGB-T) tracking is to utilize the synergistic and complementary strengths of RGB and TIR modalities to enhance tracking in diverse situations, with cross-modal interaction being a crucial element. Earlier methods often simply combine the features of the RGB and TIR search frames, leading to a coarse interaction that also introduced unnecessary background noise. Many other approaches sample candidate boxes from search frames and apply different fusion techniques to individual pairs of RGB and TIR boxes, which confines cross-modal interactions to local areas and results in insufficient context modeling. Additionally, mining video temporal contexts is also under-explored in RGB-T tracking. To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module that exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts. An Illumination Guided Fusion (IGF) module is designed to adaptively fuse RGB and TIR search region tokens with a global illumination factor. Furthermore, in the inference stage, we also propose an efficient Target-Preserved Template Updating (TPTU) strategy, leveraging the temporal context within video sequences to accommodate the target’s appearance change. Our proposed modules are integrated into a ViT backbone for joint feature extraction, search-template matching, and cross-modal interaction. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate our method achieves new state-of-the-art performances.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 1","pages":"634-649"},"PeriodicalIF":18.6000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10706882/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The goal of RGB-Thermal (RGB-T) tracking is to utilize the synergistic and complementary strengths of RGB and TIR modalities to enhance tracking in diverse situations, with cross-modal interaction being a crucial element. Earlier methods often simply combine the features of the RGB and TIR search frames, leading to a coarse interaction that also introduced unnecessary background noise. Many other approaches sample candidate boxes from search frames and apply different fusion techniques to individual pairs of RGB and TIR boxes, which confines cross-modal interactions to local areas and results in insufficient context modeling. Additionally, mining video temporal contexts is also under-explored in RGB-T tracking. To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module that exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts. An Illumination Guided Fusion (IGF) module is designed to adaptively fuse RGB and TIR search region tokens with a global illumination factor. Furthermore, in the inference stage, we also propose an efficient Target-Preserved Template Updating (TPTU) strategy, leveraging the temporal context within video sequences to accommodate the target’s appearance change. Our proposed modules are integrated into a ViT backbone for joint feature extraction, search-template matching, and cross-modal interaction. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate our method achieves new state-of-the-art performances.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用模板网格搜索交互和目标保留模板更新进行 RGB-T 跟踪
RGB-热(RGB- t)跟踪的目标是利用RGB和TIR模式的协同和互补优势,加强在不同情况下的跟踪,跨模式交互是一个关键因素。早期的方法通常简单地结合RGB和TIR搜索帧的特征,导致粗糙的相互作用,也引入了不必要的背景噪声。许多其他方法从搜索框架中采样候选框,并将不同的融合技术应用于RGB和TIR框的单个对,这将跨模态交互限制在局部区域,导致上下文建模不足。此外,挖掘视频时间上下文在RGB-T跟踪中也未得到充分的研究。为了减轻这些限制,我们提出了一种新的模板桥接搜索区域交互(TBSI)模块,该模块利用模板作为媒介,通过收集和分发与目标相关的对象和环境上下文来桥接RGB和TIR搜索区域之间的跨模态交互。设计了光照引导融合(IGF)模块,利用全局光照因子自适应融合RGB和TIR搜索区域令牌。此外,在推理阶段,我们还提出了一种有效的目标保留模板更新(TPTU)策略,利用视频序列中的时间上下文来适应目标的外观变化。我们提出的模块被集成到ViT主干中,用于联合特征提取、搜索模板匹配和跨模态交互。在三个流行的RGB-T跟踪基准上进行的大量实验表明,我们的方法实现了新的最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation. Continuous Review and Timely Correction: Enhancing the Resistance to Noisy Labels via Self-Not-True and Class-Wise Distillation. On the Transferability and Discriminability of Representation Learning in Unsupervised Domain Adaptation. Fast Multi-view Discrete Clustering via Spectral Embedding Fusion. GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1