Spatio-temporal mix deformable feature extractor in visual tracking

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2023-09-09 DOI:10.1016/j.eswa.2023.121377

Yucheng Huang , Ziwang Xiao , Eksan Firkat , Jinlai Zhang , Danfeng Wu , Askar Hamdulla

{"title":"Spatio-temporal mix deformable feature extractor in visual tracking","authors":"Yucheng Huang , Ziwang Xiao , Eksan Firkat , Jinlai Zhang , Danfeng Wu , Askar Hamdulla","doi":"10.1016/j.eswa.2023.121377","DOIUrl":null,"url":null,"abstract":"<div><p>The emergence of ACMix fundamentally integrates convolution and self-attention mechanisms, fully leveraging their advantages. However, it faces challenges in associating temporal sequences and struggles to achieve accurate feature sampling. Additionally, its global correlation ability makes it susceptible to interference from irrelevant information. To address these issues, we propose the Spatio-Temporal Deformable Mix Feature Extractor (STD-ME) based on ACMix. In STD-ME, we designed deformable modules for both convolution and attention branches, incorporating spatio-temporal context to enable more precise feature sampling. By integrating STD-ME into a tracker that employs multi-frame fusion, we aim to further enhance its performance. The utilization of Crop–Transform–Paste for manual data synthesis offers a novel perspective for self-supervised tracking. However, it is important to note that while this method has shown impressive results, the synthesized data lacks spatio-temporal continuity in attributes such as scale variation, rotation, illumination variation<span><span>, position, and partial occlusion, which limits its alignment with real-world scenarios. Consequently, training trackers based on multi-frame fusion may face challenges in achieving significant breakthroughs. To overcome this limitation, we introduce Spatial–Temporal Transformation (STT). STT utilizes an Iterative </span>Random Number Generator (IRNG) based on a normal distribution to probabilistically generate spatio-temporal continuous data. Finally, we conducted extensive experiments on STD-ME and STT to demonstrate the effectiveness of our proposed methods.</span></p></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"237 ","pages":"Article 121377"},"PeriodicalIF":7.5000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417423018791","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The emergence of ACMix fundamentally integrates convolution and self-attention mechanisms, fully leveraging their advantages. However, it faces challenges in associating temporal sequences and struggles to achieve accurate feature sampling. Additionally, its global correlation ability makes it susceptible to interference from irrelevant information. To address these issues, we propose the Spatio-Temporal Deformable Mix Feature Extractor (STD-ME) based on ACMix. In STD-ME, we designed deformable modules for both convolution and attention branches, incorporating spatio-temporal context to enable more precise feature sampling. By integrating STD-ME into a tracker that employs multi-frame fusion, we aim to further enhance its performance. The utilization of Crop–Transform–Paste for manual data synthesis offers a novel perspective for self-supervised tracking. However, it is important to note that while this method has shown impressive results, the synthesized data lacks spatio-temporal continuity in attributes such as scale variation, rotation, illumination variation, position, and partial occlusion, which limits its alignment with real-world scenarios. Consequently, training trackers based on multi-frame fusion may face challenges in achieving significant breakthroughs. To overcome this limitation, we introduce Spatial–Temporal Transformation (STT). STT utilizes an Iterative Random Number Generator (IRNG) based on a normal distribution to probabilistically generate spatio-temporal continuous data. Finally, we conducted extensive experiments on STD-ME and STT to demonstrate the effectiveness of our proposed methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

视觉跟踪中的时空混合变形特征提取方法

ACMix的出现从根本上集成了卷积和自注意机制，充分利用了它们的优势。然而，它在关联时间序列方面面临挑战，并且难以实现准确的特征采样。此外，它的全局相关能力使它容易受到不相关信息的干扰。为了解决这些问题，我们提出了基于ACMix的时空可变形混合特征提取器（STD-ME）。在STD-ME中，我们为卷积和注意力分支设计了可变形模块，结合了时空上下文，以实现更精确的特征采样。通过将STD-ME集成到采用多帧融合的跟踪器中，我们旨在进一步提高其性能。利用Crop–Transform–Paste进行手动数据合成为自我监督跟踪提供了一个新的视角。然而，值得注意的是，尽管该方法显示出了令人印象深刻的结果，但合成数据在尺度变化、旋转、照明变化、位置和部分遮挡等属性上缺乏时空连续性，这限制了其与真实世界场景的一致性。因此，基于多帧融合的训练跟踪器在实现重大突破方面可能面临挑战。为了克服这一限制，我们引入了时空变换（STT）。STT利用基于正态分布的迭代随机数生成器（IRNG）来概率地生成时空连续数据。最后，我们在STD-ME和STT上进行了广泛的实验，以证明我们提出的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.

期刊最新文献

Editorial Board Three decades of differential evolution: a bibliometric analysis (1995-2025) Escaping from saddle points with perturbed gradient estimation An intelligent approach to maritime autonomous surface ship performance evaluation Knowledge-guided hyper-heuristic evolutionary algorithm for large-scale Boolean network inference