RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework

Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui
{"title":"RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework","authors":"Xiaoyu Li, Peidong Li, Lijun Zhao, Dedong Liu, Jinghan Gao, Xian Wu, Yitao Wu, Dixiao Cui","doi":"arxiv-2409.11749","DOIUrl":null,"url":null,"abstract":"3D Multi-Object Tracking (MOT) obtains significant performance improvements\nwith the rapid advancements in 3D object detection, particularly in\ncost-effective multi-camera setups. However, the prevalent end-to-end training\napproach for multi-camera trackers results in detector-specific models,\nlimiting their versatility. Moreover, current generic trackers overlook the\nunique features of multi-camera detectors, i.e., the unreliability of motion\nobservations and the feasibility of visual information. To address these\nchallenges, we propose RockTrack, a 3D MOT method for multi-camera detectors.\nFollowing the Tracking-By-Detection framework, RockTrack is compatible with\nvarious off-the-shelf detectors. RockTrack incorporates a confidence-guided\npreprocessing module to extract reliable motion and image observations from\ndistinct representation spaces from a single detector. These observations are\nthen fused in an association module that leverages geometric and appearance\ncues to minimize mismatches. The resulting matches are propagated through a\nstaged estimation process, forming the basis for heuristic noise modeling.\nAdditionally, we introduce a novel appearance similarity metric for explicitly\ncharacterizing object affinities in multi-camera settings. RockTrack achieves\nstate-of-the-art performance on the nuScenes vision-only tracking leaderboard\nwith 59.1% AMOTA while demonstrating impressive computational efficiency.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

3D Multi-Object Tracking (MOT) obtains significant performance improvements with the rapid advancements in 3D object detection, particularly in cost-effective multi-camera setups. However, the prevalent end-to-end training approach for multi-camera trackers results in detector-specific models, limiting their versatility. Moreover, current generic trackers overlook the unique features of multi-camera detectors, i.e., the unreliability of motion observations and the feasibility of visual information. To address these challenges, we propose RockTrack, a 3D MOT method for multi-camera detectors. Following the Tracking-By-Detection framework, RockTrack is compatible with various off-the-shelf detectors. RockTrack incorporates a confidence-guided preprocessing module to extract reliable motion and image observations from distinct representation spaces from a single detector. These observations are then fused in an association module that leverages geometric and appearance cues to minimize mismatches. The resulting matches are propagated through a staged estimation process, forming the basis for heuristic noise modeling. Additionally, we introduce a novel appearance similarity metric for explicitly characterizing object affinities in multi-camera settings. RockTrack achieves state-of-the-art performance on the nuScenes vision-only tracking leaderboard with 59.1% AMOTA while demonstrating impressive computational efficiency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
RockTrack:3D Robust Multi-Camera-Ken 多目标跟踪框架
随着三维物体检测技术的快速发展,三维多目标跟踪(MOT)的性能得到了显著提高,特别是在成本效益高的多摄像头设置中。然而,多摄像头跟踪器普遍采用的端到端训练方法导致了特定于探测器的模型,限制了其通用性。此外,当前的通用跟踪器忽略了多摄像机探测器的独特特征,即运动观测的不稳定性和视觉信息的可行性。为了应对这些挑战,我们提出了一种适用于多摄像头探测器的 3D MOT 方法--RockTrack。RockTrack 采用置信度引导预处理模块,从单个探测器的不同表示空间中提取可靠的运动和图像观测值。然后将这些观察结果融合到一个关联模块中,该模块利用几何和外观线索来尽量减少不匹配。此外,我们还引入了一种新颖的外观相似度量,用于明确描述多摄像头环境下的物体亲和性。RockTrack 在 nuScenes 纯视觉跟踪排行榜上取得了最先进的性能,AMOTA 为 59.1%,同时显示出令人印象深刻的计算效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1