Robust Multi-Object Tracking with pseudo-information guided motion and enhanced semantic vision

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-05-10 Epub Date: 2025-02-17 DOI:10.1016/j.eswa.2025.126846

Yukuan Zhang , Shengsheng Wang , Zihao Fu , Limin Zhao , Jiarui Zhao

{"title":"Robust Multi-Object Tracking with pseudo-information guided motion and enhanced semantic vision","authors":"Yukuan Zhang , Shengsheng Wang , Zihao Fu , Limin Zhao , Jiarui Zhao","doi":"10.1016/j.eswa.2025.126846","DOIUrl":null,"url":null,"abstract":"<div><div>The key to Multi-Object Tracking is to differentiate multiple instances in a video sequence and maintain their identity continuity. To achieve this goal, most methods model the motion or appearance cues of instances. However, when faced with complex scenarios like camera motion, occlusion, and crowding, trackers often lack discriminative capabilities. In this paper, we propose a robust tracker, named RccTrack, that combines motion cues guided by pseudo information and enhanced visual clues to overcome the aforementioned issues. Specifically, pseudo-observation information is constructed for guiding trajectory localization and generate interference-resistant trajectories. Pseudo-state information is constructed for guiding the calculation of inter-frame target motion directions. These pseudo-information is used to enhance the discriminative power of the motion cues. For visual cues, a semantic fusion network is designed to extract strong discriminative appearance information and store them in our hierarchical fusion embedding clusters, thus enhancing the discriminative power of the visual cues. In addition, we design the cascade matching method, which performs the association task based on the trajectory length information to distinguish confusing targets. In the matching stage, the two cues mentioned above are combined to enhance the discriminative power of the tracker. Experimental results demonstrate that RccTrack achieves state-of-the-art performance on MOT16, MOT17, MOT20, and DanceTrack benchmarks.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126846"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425004683","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The key to Multi-Object Tracking is to differentiate multiple instances in a video sequence and maintain their identity continuity. To achieve this goal, most methods model the motion or appearance cues of instances. However, when faced with complex scenarios like camera motion, occlusion, and crowding, trackers often lack discriminative capabilities. In this paper, we propose a robust tracker, named RccTrack, that combines motion cues guided by pseudo information and enhanced visual clues to overcome the aforementioned issues. Specifically, pseudo-observation information is constructed for guiding trajectory localization and generate interference-resistant trajectories. Pseudo-state information is constructed for guiding the calculation of inter-frame target motion directions. These pseudo-information is used to enhance the discriminative power of the motion cues. For visual cues, a semantic fusion network is designed to extract strong discriminative appearance information and store them in our hierarchical fusion embedding clusters, thus enhancing the discriminative power of the visual cues. In addition, we design the cascade matching method, which performs the association task based on the trajectory length information to distinguish confusing targets. In the matching stage, the two cues mentioned above are combined to enhance the discriminative power of the tracker. Experimental results demonstrate that RccTrack achieves state-of-the-art performance on MOT16, MOT17, MOT20, and DanceTrack benchmarks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

伪信息引导运动和增强语义视觉的鲁棒多目标跟踪

多目标跟踪的关键是区分视频序列中的多个实例并保持其身份连续性。为了实现这一目标，大多数方法对实例的运动或外观线索进行建模。然而，当面对摄像机运动、遮挡和拥挤等复杂场景时，跟踪器往往缺乏辨别能力。在本文中，我们提出了一种鲁棒跟踪器RccTrack，它结合了伪信息引导的运动线索和增强的视觉线索来克服上述问题。构造伪观测信息，引导弹道定位，生成抗干扰弹道。构造伪状态信息，指导帧间目标运动方向的计算。这些伪信息被用来增强运动线索的辨别能力。对于视觉线索，设计语义融合网络提取强判别性的外观信息，并将其存储在我们的分层融合嵌入聚类中，从而增强视觉线索的判别能力。此外，我们设计了级联匹配方法，该方法基于轨迹长度信息执行关联任务，以区分混淆目标。在匹配阶段，将上述两种线索结合起来，增强跟踪器的识别能力。实验结果表明，RccTrack在MOT16、MOT17、MOT20和DanceTrack基准测试中达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.