{"title":"Spatio-temporal mix deformable feature extractor in visual tracking","authors":"Yucheng Huang , Ziwang Xiao , Eksan Firkat , Jinlai Zhang , Danfeng Wu , Askar Hamdulla","doi":"10.1016/j.eswa.2023.121377","DOIUrl":null,"url":null,"abstract":"<div><p>The emergence of ACMix fundamentally integrates convolution and self-attention mechanisms, fully leveraging their advantages. However, it faces challenges in associating temporal sequences and struggles to achieve accurate feature sampling. Additionally, its global correlation ability makes it susceptible to interference from irrelevant information. To address these issues, we propose the Spatio-Temporal Deformable Mix Feature Extractor (STD-ME) based on ACMix. In STD-ME, we designed deformable modules for both convolution and attention branches, incorporating spatio-temporal context to enable more precise feature sampling. By integrating STD-ME into a tracker that employs multi-frame fusion, we aim to further enhance its performance. The utilization of Crop–Transform–Paste for manual data synthesis offers a novel perspective for self-supervised tracking. However, it is important to note that while this method has shown impressive results, the synthesized data lacks spatio-temporal continuity in attributes such as scale variation, rotation, illumination variation<span><span>, position, and partial occlusion, which limits its alignment with real-world scenarios. Consequently, training trackers based on multi-frame fusion may face challenges in achieving significant breakthroughs. To overcome this limitation, we introduce Spatial–Temporal Transformation (STT). STT utilizes an Iterative </span>Random Number Generator (IRNG) based on a normal distribution to probabilistically generate spatio-temporal continuous data. Finally, we conducted extensive experiments on STD-ME and STT to demonstrate the effectiveness of our proposed methods.</span></p></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417423018791","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The emergence of ACMix fundamentally integrates convolution and self-attention mechanisms, fully leveraging their advantages. However, it faces challenges in associating temporal sequences and struggles to achieve accurate feature sampling. Additionally, its global correlation ability makes it susceptible to interference from irrelevant information. To address these issues, we propose the Spatio-Temporal Deformable Mix Feature Extractor (STD-ME) based on ACMix. In STD-ME, we designed deformable modules for both convolution and attention branches, incorporating spatio-temporal context to enable more precise feature sampling. By integrating STD-ME into a tracker that employs multi-frame fusion, we aim to further enhance its performance. The utilization of Crop–Transform–Paste for manual data synthesis offers a novel perspective for self-supervised tracking. However, it is important to note that while this method has shown impressive results, the synthesized data lacks spatio-temporal continuity in attributes such as scale variation, rotation, illumination variation, position, and partial occlusion, which limits its alignment with real-world scenarios. Consequently, training trackers based on multi-frame fusion may face challenges in achieving significant breakthroughs. To overcome this limitation, we introduce Spatial–Temporal Transformation (STT). STT utilizes an Iterative Random Number Generator (IRNG) based on a normal distribution to probabilistically generate spatio-temporal continuous data. Finally, we conducted extensive experiments on STD-ME and STT to demonstrate the effectiveness of our proposed methods.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.