Fengwei Gu;Jun Lu;Chengtao Cai;Qidan Zhu;Zhaojie Ju
{"title":"VTST: Efficient Visual Tracking With a Stereoscopic Transformer","authors":"Fengwei Gu;Jun Lu;Chengtao Cai;Qidan Zhu;Zhaojie Ju","doi":"10.1109/TETCI.2024.3360303","DOIUrl":null,"url":null,"abstract":"Although Siamese trackers have become increasingly prevalent in the visual tracking domain, they are easily interfered by semantic distractors in complex environments, which results in the underutilization of feature information. Especially when multiple disturbances work together, the performance of many trackers often suffers severe degradation. To solve the above problem, this paper presents a robust Stereoscopic Transformer network for improving tracking performance. Using a hybrid attention mechanism, our method is composed of a channel feature awareness network (CFAN), a global channel attention network (GCAN), and a multi-level feature enhancement unit (MFEU). Concretely, CFAN focuses on specific channel information, while highlighting the contained target features and weakening the semantic distractor features. As an intermediate hub, GCAN is mainly responsible for establishing the global feature dependencies between the search region and the template, while selecting the concerned channel features to improve the distinguishing ability of the model. In particular, MFEU is used to enhance multi-level feature information to facilitate feature representation learning for our method. Finally, a Transformer-based Siamese tracker (named VTST) is proposed to present an efficient tracking representation, which can gain advantages over a variety of challenging attributes. Experiments show that our method outperforms the state-of-the-art trackers on multiple benchmarks with a real-time running speed of 56.0 fps.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3000,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10433230/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Although Siamese trackers have become increasingly prevalent in the visual tracking domain, they are easily interfered by semantic distractors in complex environments, which results in the underutilization of feature information. Especially when multiple disturbances work together, the performance of many trackers often suffers severe degradation. To solve the above problem, this paper presents a robust Stereoscopic Transformer network for improving tracking performance. Using a hybrid attention mechanism, our method is composed of a channel feature awareness network (CFAN), a global channel attention network (GCAN), and a multi-level feature enhancement unit (MFEU). Concretely, CFAN focuses on specific channel information, while highlighting the contained target features and weakening the semantic distractor features. As an intermediate hub, GCAN is mainly responsible for establishing the global feature dependencies between the search region and the template, while selecting the concerned channel features to improve the distinguishing ability of the model. In particular, MFEU is used to enhance multi-level feature information to facilitate feature representation learning for our method. Finally, a Transformer-based Siamese tracker (named VTST) is proposed to present an efficient tracking representation, which can gain advantages over a variety of challenging attributes. Experiments show that our method outperforms the state-of-the-art trackers on multiple benchmarks with a real-time running speed of 56.0 fps.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.