RGBT tracking via frequency-aware feature enhancement and unidirectional mixed attention

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-11-19 DOI:10.1016/j.neucom.2024.128908
Jianming Zhang , Jing Yang , Zikang Liu , Jin Wang
{"title":"RGBT tracking via frequency-aware feature enhancement and unidirectional mixed attention","authors":"Jianming Zhang ,&nbsp;Jing Yang ,&nbsp;Zikang Liu ,&nbsp;Jin Wang","doi":"10.1016/j.neucom.2024.128908","DOIUrl":null,"url":null,"abstract":"<div><div>RGBT object tracking is widely used due to the complementary nature of RGB and TIR modalities. However, RGBT trackers based on Transformer or CNN face significant challenges in effectively enhancing and extracting features from one modality and fusing them into another modality. To achieve effective regional feature representation and adequate information fusion, we propose a novel tracking method that employs frequency-aware feature enhancement and bidirectional multistage feature fusion. Firstly, we propose an Early Region Feature Enhancement (ERFE) module, which is comprised of the Frequency-aware Self-region Feature Enhancement (FSFE) block and the Cross-attention Cross-region Feature Enhancement (CCFE) block. The FFT-based FSFE block can enhance the feature of the template or search region separately, while the CCFE block can improve feature representation by considering the template and search region jointly. Secondly, we propose a Bidirectional Multistage Feature Fusion (BMFF) module, with the Complementary Feature Extraction Attention (CFEA) module as its core component. The CFEA module including the Unidirectional Mixed Attention (UMA) block and the Context Focused Attention (CFA) block, can extract information from one modality. When RGB is the primary modality, TIR is the auxiliary modality, and vice versa. The auxiliary modal features processed by CFEA are added to the primary modal features. This information fusion process is bidirectional and multistage. Thirdly, extensive experiments on three benchmark datasets — RGBT234, LaSHeR, and GTOT — demonstrate that our tracker outperforms the advanced RGBT tracking methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128908"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224016795","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

RGBT object tracking is widely used due to the complementary nature of RGB and TIR modalities. However, RGBT trackers based on Transformer or CNN face significant challenges in effectively enhancing and extracting features from one modality and fusing them into another modality. To achieve effective regional feature representation and adequate information fusion, we propose a novel tracking method that employs frequency-aware feature enhancement and bidirectional multistage feature fusion. Firstly, we propose an Early Region Feature Enhancement (ERFE) module, which is comprised of the Frequency-aware Self-region Feature Enhancement (FSFE) block and the Cross-attention Cross-region Feature Enhancement (CCFE) block. The FFT-based FSFE block can enhance the feature of the template or search region separately, while the CCFE block can improve feature representation by considering the template and search region jointly. Secondly, we propose a Bidirectional Multistage Feature Fusion (BMFF) module, with the Complementary Feature Extraction Attention (CFEA) module as its core component. The CFEA module including the Unidirectional Mixed Attention (UMA) block and the Context Focused Attention (CFA) block, can extract information from one modality. When RGB is the primary modality, TIR is the auxiliary modality, and vice versa. The auxiliary modal features processed by CFEA are added to the primary modal features. This information fusion process is bidirectional and multistage. Thirdly, extensive experiments on three benchmark datasets — RGBT234, LaSHeR, and GTOT — demonstrate that our tracker outperforms the advanced RGBT tracking methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于频率感知特征增强和单向混合注意的rbt跟踪
由于RGB和TIR模式的互补性,RGB目标跟踪得到了广泛的应用。然而,基于Transformer或CNN的rbt跟踪器在有效增强和提取一种模态的特征并将其融合到另一种模态方面面临着重大挑战。为了实现有效的区域特征表示和充分的信息融合,提出了一种采用频率感知特征增强和双向多阶段特征融合的跟踪方法。首先,我们提出了一种早期区域特征增强(ERFE)模块,该模块由频率感知自区域特征增强(FSFE)块和交叉注意跨区域特征增强(CCFE)块组成。基于fft的FSFE块可以单独增强模板或搜索区域的特征,而CCFE块可以通过联合考虑模板和搜索区域来提高特征表示。其次,提出了以互补特征提取注意(CFEA)模块为核心的双向多阶段特征融合(BMFF)模块;CFEA模块包括单向混合注意(UMA)块和上下文聚焦注意(CFA)块,可以从一种模态中提取信息。当RGB是主要模态时,TIR是辅助模态,反之亦然。在主模态特征的基础上加入了经CFEA处理的辅助模态特征。这种信息融合过程是双向的、多阶段的。第三,在RGBT234、LaSHeR和GTOT三个基准数据集上进行了广泛的实验,证明了我们的跟踪器优于先进的RGBT234跟踪方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
Monocular thermal SLAM with neural radiance fields for 3D scene reconstruction Learning a more compact representation for low-rank tensor completion An HVS-derived network for assessing the quality of camouflaged targets with feature fusion Global Span Semantic Dependency Awareness and Filtering Network for nested named entity recognition A user behavior-aware multi-task learning model for enhanced short video recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1