{"title":"RockTrack:3D Robust Multi-Camera-Ken 多目标跟踪框架","authors":"Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui","doi":"arxiv-2409.11749","DOIUrl":null,"url":null,"abstract":"3D Multi-Object Tracking (MOT) obtains significant performance improvements\nwith the rapid advancements in 3D object detection, particularly in\ncost-effective multi-camera setups. However, the prevalent end-to-end training\napproach for multi-camera trackers results in detector-specific models,\nlimiting their versatility. Moreover, current generic trackers overlook the\nunique features of multi-camera detectors, i.e., the unreliability of motion\nobservations and the feasibility of visual information. To address these\nchallenges, we propose RockTrack, a 3D MOT method for multi-camera detectors.\nFollowing the Tracking-By-Detection framework, RockTrack is compatible with\nvarious off-the-shelf detectors. RockTrack incorporates a confidence-guided\npreprocessing module to extract reliable motion and image observations from\ndistinct representation spaces from a single detector. These observations are\nthen fused in an association module that leverages geometric and appearance\ncues to minimize mismatches. The resulting matches are propagated through a\nstaged estimation process, forming the basis for heuristic noise modeling.\nAdditionally, we introduce a novel appearance similarity metric for explicitly\ncharacterizing object affinities in multi-camera settings. RockTrack achieves\nstate-of-the-art performance on the nuScenes vision-only tracking leaderboard\nwith 59.1% AMOTA while demonstrating impressive computational efficiency.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework\",\"authors\":\"Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui\",\"doi\":\"arxiv-2409.11749\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D Multi-Object Tracking (MOT) obtains significant performance improvements\\nwith the rapid advancements in 3D object detection, particularly in\\ncost-effective multi-camera setups. However, the prevalent end-to-end training\\napproach for multi-camera trackers results in detector-specific models,\\nlimiting their versatility. Moreover, current generic trackers overlook the\\nunique features of multi-camera detectors, i.e., the unreliability of motion\\nobservations and the feasibility of visual information. To address these\\nchallenges, we propose RockTrack, a 3D MOT method for multi-camera detectors.\\nFollowing the Tracking-By-Detection framework, RockTrack is compatible with\\nvarious off-the-shelf detectors. RockTrack incorporates a confidence-guided\\npreprocessing module to extract reliable motion and image observations from\\ndistinct representation spaces from a single detector. These observations are\\nthen fused in an association module that leverages geometric and appearance\\ncues to minimize mismatches. The resulting matches are propagated through a\\nstaged estimation process, forming the basis for heuristic noise modeling.\\nAdditionally, we introduce a novel appearance similarity metric for explicitly\\ncharacterizing object affinities in multi-camera settings. RockTrack achieves\\nstate-of-the-art performance on the nuScenes vision-only tracking leaderboard\\nwith 59.1% AMOTA while demonstrating impressive computational efficiency.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11749\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
随着三维物体检测技术的快速发展,三维多目标跟踪(MOT)的性能得到了显著提高,特别是在成本效益高的多摄像头设置中。然而,多摄像头跟踪器普遍采用的端到端训练方法导致了特定于探测器的模型,限制了其通用性。此外,当前的通用跟踪器忽略了多摄像机探测器的独特特征,即运动观测的不稳定性和视觉信息的可行性。为了应对这些挑战,我们提出了一种适用于多摄像头探测器的 3D MOT 方法--RockTrack。RockTrack 采用置信度引导预处理模块,从单个探测器的不同表示空间中提取可靠的运动和图像观测值。然后将这些观察结果融合到一个关联模块中,该模块利用几何和外观线索来尽量减少不匹配。此外,我们还引入了一种新颖的外观相似度量,用于明确描述多摄像头环境下的物体亲和性。RockTrack 在 nuScenes 纯视觉跟踪排行榜上取得了最先进的性能,AMOTA 为 59.1%,同时显示出令人印象深刻的计算效率。
RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework
3D Multi-Object Tracking (MOT) obtains significant performance improvements
with the rapid advancements in 3D object detection, particularly in
cost-effective multi-camera setups. However, the prevalent end-to-end training
approach for multi-camera trackers results in detector-specific models,
limiting their versatility. Moreover, current generic trackers overlook the
unique features of multi-camera detectors, i.e., the unreliability of motion
observations and the feasibility of visual information. To address these
challenges, we propose RockTrack, a 3D MOT method for multi-camera detectors.
Following the Tracking-By-Detection framework, RockTrack is compatible with
various off-the-shelf detectors. RockTrack incorporates a confidence-guided
preprocessing module to extract reliable motion and image observations from
distinct representation spaces from a single detector. These observations are
then fused in an association module that leverages geometric and appearance
cues to minimize mismatches. The resulting matches are propagated through a
staged estimation process, forming the basis for heuristic noise modeling.
Additionally, we introduce a novel appearance similarity metric for explicitly
characterizing object affinities in multi-camera settings. RockTrack achieves
state-of-the-art performance on the nuScenes vision-only tracking leaderboard
with 59.1% AMOTA while demonstrating impressive computational efficiency.