RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI:arxiv-2409.11749

Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui

{"title":"RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework","authors":"Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui","doi":"arxiv-2409.11749","DOIUrl":null,"url":null,"abstract":"3D Multi-Object Tracking (MOT) obtains significant performance improvements\nwith the rapid advancements in 3D object detection, particularly in\ncost-effective multi-camera setups. However, the prevalent end-to-end training\napproach for multi-camera trackers results in detector-specific models,\nlimiting their versatility. Moreover, current generic trackers overlook the\nunique features of multi-camera detectors, i.e., the unreliability of motion\nobservations and the feasibility of visual information. To address these\nchallenges, we propose RockTrack, a 3D MOT method for multi-camera detectors.\nFollowing the Tracking-By-Detection framework, RockTrack is compatible with\nvarious off-the-shelf detectors. RockTrack incorporates a confidence-guided\npreprocessing module to extract reliable motion and image observations from\ndistinct representation spaces from a single detector. These observations are\nthen fused in an association module that leverages geometric and appearance\ncues to minimize mismatches. The resulting matches are propagated through a\nstaged estimation process, forming the basis for heuristic noise modeling.\nAdditionally, we introduce a novel appearance similarity metric for explicitly\ncharacterizing object affinities in multi-camera settings. RockTrack achieves\nstate-of-the-art performance on the nuScenes vision-only tracking leaderboard\nwith 59.1% AMOTA while demonstrating impressive computational efficiency.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

3D Multi-Object Tracking (MOT) obtains significant performance improvements with the rapid advancements in 3D object detection, particularly in cost-effective multi-camera setups. However, the prevalent end-to-end training approach for multi-camera trackers results in detector-specific models, limiting their versatility. Moreover, current generic trackers overlook the unique features of multi-camera detectors, i.e., the unreliability of motion observations and the feasibility of visual information. To address these challenges, we propose RockTrack, a 3D MOT method for multi-camera detectors. Following the Tracking-By-Detection framework, RockTrack is compatible with various off-the-shelf detectors. RockTrack incorporates a confidence-guided preprocessing module to extract reliable motion and image observations from distinct representation spaces from a single detector. These observations are then fused in an association module that leverages geometric and appearance cues to minimize mismatches. The resulting matches are propagated through a staged estimation process, forming the basis for heuristic noise modeling. Additionally, we introduce a novel appearance similarity metric for explicitly characterizing object affinities in multi-camera settings. RockTrack achieves state-of-the-art performance on the nuScenes vision-only tracking leaderboard with 59.1% AMOTA while demonstrating impressive computational efficiency.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RockTrack：3D Robust Multi-Camera-Ken 多目标跟踪框架

随着三维物体检测技术的快速发展，三维多目标跟踪（MOT）的性能得到了显著提高，特别是在成本效益高的多摄像头设置中。然而，多摄像头跟踪器普遍采用的端到端训练方法导致了特定于探测器的模型，限制了其通用性。此外，当前的通用跟踪器忽略了多摄像机探测器的独特特征，即运动观测的不稳定性和视觉信息的可行性。为了应对这些挑战，我们提出了一种适用于多摄像头探测器的 3D MOT 方法--RockTrack。RockTrack 采用置信度引导预处理模块，从单个探测器的不同表示空间中提取可靠的运动和图像观测值。然后将这些观察结果融合到一个关联模块中，该模块利用几何和外观线索来尽量减少不匹配。此外，我们还引入了一种新颖的外观相似度量，用于明确描述多摄像头环境下的物体亲和性。RockTrack 在 nuScenes 纯视觉跟踪排行榜上取得了最先进的性能，AMOTA 为 59.1%，同时显示出令人印象深刻的计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey