Visual tracking with screening region enrichment and target validation

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Machine Learning and Cybernetics Pub Date : 2024-09-08 DOI:10.1007/s13042-024-02346-6

Yiqiu Sun, Dongming Zhou, Kaixiang Yan

{"title":"Visual tracking with screening region enrichment and target validation","authors":"Yiqiu Sun, Dongming Zhou, Kaixiang Yan","doi":"10.1007/s13042-024-02346-6","DOIUrl":null,"url":null,"abstract":"<p>The introduction of the one-stream one-stage framework has led to remarkable advances in visual object tracking, resulting in exceptional tracking performance. Most existing one-stream one-stage tracking pipelines have achieved a relative balance between accuracy and speed. However, they focus solely on integrating feature learning and relational modelling. In complex scenes, the tracking performance often falls short due to confounding factors such as changes in target scale, occlusion, and fast motion. In these cases, numerous trackers cannot sufficiently exploit the target feature information and face the dilemma of information loss. To address these challenges, we propose a screening enrichment for transformer-based tracking. Our method incorporates a screening enrichment module as an additional processing operation in the integration of feature learning and relational modelling. The module effectively distinguishes target areas within the search regions. It also enriches the associations between tokens of target area information. In addition, we introduce our box validation module. This module uses the target position information from the previous frame to validate and revise the target position in the current frame. This process enables more accurate target localization. Through these innovations, we have developed a powerful and efficient tracker. It achieves state-of-the-art performance on six benchmark datasets, including GOT-10K, LaSOT, TrackingNet, UAV123, TNL2K and VOT2020. On the GOT-10K benchmarks, Specifically, on the GOT-10K benchmarks, our proposed tracker reaches an impressive Success Rate (<span>\\(S{{R}_{0.5}}\\)</span>) of 85.4 and an Average Overlap (AO) of 75.3. Experimental results show that our proposed tracker outperforms other state-of-the-art trackers in terms of tracking accuracy.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"405 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Machine Learning and Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s13042-024-02346-6","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The introduction of the one-stream one-stage framework has led to remarkable advances in visual object tracking, resulting in exceptional tracking performance. Most existing one-stream one-stage tracking pipelines have achieved a relative balance between accuracy and speed. However, they focus solely on integrating feature learning and relational modelling. In complex scenes, the tracking performance often falls short due to confounding factors such as changes in target scale, occlusion, and fast motion. In these cases, numerous trackers cannot sufficiently exploit the target feature information and face the dilemma of information loss. To address these challenges, we propose a screening enrichment for transformer-based tracking. Our method incorporates a screening enrichment module as an additional processing operation in the integration of feature learning and relational modelling. The module effectively distinguishes target areas within the search regions. It also enriches the associations between tokens of target area information. In addition, we introduce our box validation module. This module uses the target position information from the previous frame to validate and revise the target position in the current frame. This process enables more accurate target localization. Through these innovations, we have developed a powerful and efficient tracker. It achieves state-of-the-art performance on six benchmark datasets, including GOT-10K, LaSOT, TrackingNet, UAV123, TNL2K and VOT2020. On the GOT-10K benchmarks, Specifically, on the GOT-10K benchmarks, our proposed tracker reaches an impressive Success Rate (\(S{{R}_{0.5}}\)) of 85.4 and an Average Overlap (AO) of 75.3. Experimental results show that our proposed tracker outperforms other state-of-the-art trackers in terms of tracking accuracy.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过筛选区域富集和目标验证进行视觉跟踪

单流单级框架的引入在视觉物体跟踪领域取得了显著进步，带来了卓越的跟踪性能。大多数现有的单流单级跟踪管道都在精度和速度之间取得了相对平衡。然而，它们只关注特征学习和关系建模的整合。在复杂场景中，由于目标尺度变化、遮挡和快速运动等干扰因素，跟踪性能往往不尽如人意。在这种情况下，众多跟踪器无法充分利用目标特征信息，面临信息丢失的困境。为了应对这些挑战，我们提出了一种基于变压器跟踪的筛选富集方法。我们的方法在特征学习和关系建模的整合过程中加入了筛选富集模块，作为额外的处理操作。该模块能有效区分搜索区域内的目标区域。它还能丰富目标区域信息词块之间的关联。此外，我们还引入了方框验证模块。该模块使用前一帧的目标位置信息来验证和修正当前帧的目标位置。这一过程可实现更精确的目标定位。通过这些创新，我们开发出了功能强大且高效的跟踪器。它在 GOT-10K、LaSOT、TrackingNet、UAV123、TNL2K 和 VOT2020 等六个基准数据集上实现了最先进的性能。具体来说，在 GOT-10K 基准数据集上，我们提出的跟踪器的成功率（S{R}_{0.5}}/）达到了令人印象深刻的 85.4，平均重叠率（AO）达到了 75.3。实验结果表明，我们提出的跟踪器在跟踪精度方面优于其他最先进的跟踪器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Machine Learning and Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

7.90

自引率

10.70%

发文量

225

期刊介绍： Cybernetics is concerned with describing complex interactions and interrelationships between systems which are omnipresent in our daily life. Machine Learning discovers fundamental functional relationships between variables and ensembles of variables in systems. The merging of the disciplines of Machine Learning and Cybernetics is aimed at the discovery of various forms of interaction between systems through diverse mechanisms of learning from data. The International Journal of Machine Learning and Cybernetics (IJMLC) focuses on the key research problems emerging at the junction of machine learning and cybernetics and serves as a broad forum for rapid dissemination of the latest advancements in the area. The emphasis of IJMLC is on the hybrid development of machine learning and cybernetics schemes inspired by different contributing disciplines such as engineering, mathematics, cognitive sciences, and applications. New ideas, design alternatives, implementations and case studies pertaining to all the aspects of machine learning and cybernetics fall within the scope of the IJMLC. Key research areas to be covered by the journal include: Machine Learning for modeling interactions between systems Pattern Recognition technology to support discovery of system-environment interaction Control of system-environment interactions Biochemical interaction in biological and biologically-inspired systems Learning for improvement of communication schemes between systems