Spatio-temporal mix deformable feature extractor in visual tracking

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2023-09-09 DOI:10.1016/j.eswa.2023.121377
Yucheng Huang , Ziwang Xiao , Eksan Firkat , Jinlai Zhang , Danfeng Wu , Askar Hamdulla
{"title":"Spatio-temporal mix deformable feature extractor in visual tracking","authors":"Yucheng Huang ,&nbsp;Ziwang Xiao ,&nbsp;Eksan Firkat ,&nbsp;Jinlai Zhang ,&nbsp;Danfeng Wu ,&nbsp;Askar Hamdulla","doi":"10.1016/j.eswa.2023.121377","DOIUrl":null,"url":null,"abstract":"<div><p>The emergence of ACMix fundamentally integrates convolution and self-attention mechanisms, fully leveraging their advantages. However, it faces challenges in associating temporal sequences and struggles to achieve accurate feature sampling. Additionally, its global correlation ability makes it susceptible to interference from irrelevant information. To address these issues, we propose the Spatio-Temporal Deformable Mix Feature Extractor (STD-ME) based on ACMix. In STD-ME, we designed deformable modules for both convolution and attention branches, incorporating spatio-temporal context to enable more precise feature sampling. By integrating STD-ME into a tracker that employs multi-frame fusion, we aim to further enhance its performance. The utilization of Crop–Transform–Paste for manual data synthesis offers a novel perspective for self-supervised tracking. However, it is important to note that while this method has shown impressive results, the synthesized data lacks spatio-temporal continuity in attributes such as scale variation, rotation, illumination variation<span><span>, position, and partial occlusion, which limits its alignment with real-world scenarios. Consequently, training trackers based on multi-frame fusion may face challenges in achieving significant breakthroughs. To overcome this limitation, we introduce Spatial–Temporal Transformation (STT). STT utilizes an Iterative </span>Random Number Generator (IRNG) based on a normal distribution to probabilistically generate spatio-temporal continuous data. Finally, we conducted extensive experiments on STD-ME and STT to demonstrate the effectiveness of our proposed methods.</span></p></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417423018791","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The emergence of ACMix fundamentally integrates convolution and self-attention mechanisms, fully leveraging their advantages. However, it faces challenges in associating temporal sequences and struggles to achieve accurate feature sampling. Additionally, its global correlation ability makes it susceptible to interference from irrelevant information. To address these issues, we propose the Spatio-Temporal Deformable Mix Feature Extractor (STD-ME) based on ACMix. In STD-ME, we designed deformable modules for both convolution and attention branches, incorporating spatio-temporal context to enable more precise feature sampling. By integrating STD-ME into a tracker that employs multi-frame fusion, we aim to further enhance its performance. The utilization of Crop–Transform–Paste for manual data synthesis offers a novel perspective for self-supervised tracking. However, it is important to note that while this method has shown impressive results, the synthesized data lacks spatio-temporal continuity in attributes such as scale variation, rotation, illumination variation, position, and partial occlusion, which limits its alignment with real-world scenarios. Consequently, training trackers based on multi-frame fusion may face challenges in achieving significant breakthroughs. To overcome this limitation, we introduce Spatial–Temporal Transformation (STT). STT utilizes an Iterative Random Number Generator (IRNG) based on a normal distribution to probabilistically generate spatio-temporal continuous data. Finally, we conducted extensive experiments on STD-ME and STT to demonstrate the effectiveness of our proposed methods.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
视觉跟踪中的时空混合变形特征提取方法
ACMix的出现从根本上集成了卷积和自注意机制,充分利用了它们的优势。然而,它在关联时间序列方面面临挑战,并且难以实现准确的特征采样。此外,它的全局相关能力使它容易受到不相关信息的干扰。为了解决这些问题,我们提出了基于ACMix的时空可变形混合特征提取器(STD-ME)。在STD-ME中,我们为卷积和注意力分支设计了可变形模块,结合了时空上下文,以实现更精确的特征采样。通过将STD-ME集成到采用多帧融合的跟踪器中,我们旨在进一步提高其性能。利用Crop–Transform–Paste进行手动数据合成为自我监督跟踪提供了一个新的视角。然而,值得注意的是,尽管该方法显示出了令人印象深刻的结果,但合成数据在尺度变化、旋转、照明变化、位置和部分遮挡等属性上缺乏时空连续性,这限制了其与真实世界场景的一致性。因此,基于多帧融合的训练跟踪器在实现重大突破方面可能面临挑战。为了克服这一限制,我们引入了时空变换(STT)。STT利用基于正态分布的迭代随机数生成器(IRNG)来概率地生成时空连续数据。最后,我们在STD-ME和STT上进行了广泛的实验,以证明我们提出的方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
期刊最新文献
CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic information Learning face super-resolution through identity features and distilling facial prior knowledge Identification of gene regulatory networks associated with breast cancer patient survival using an interpretable deep neural network model A high-effective swarm intelligence-based multi-robot cooperation method for target searching in unknown hazardous environments A new look of dispatching for multi-objective interbay AMHS in semiconductor wafer manufacturing: A T–S fuzzy-based learning approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1