Pro2Diff: Proposal Propagation for Multi-Object Tracking via the Diffusion Model

Hongmin Liu;Canbin Zhang;Bin Fan;Jinglin Xu
{"title":"Pro2Diff: Proposal Propagation for Multi-Object Tracking via the Diffusion Model","authors":"Hongmin Liu;Canbin Zhang;Bin Fan;Jinglin Xu","doi":"10.1109/TIP.2024.3494600","DOIUrl":null,"url":null,"abstract":"Multi-object tracking (MOT) aims to estimate the bounding boxes and ID labels of objects in videos. The challenging issue in this task is to alleviate competitive learning between the detection and tracking subtasks, for which, two-stage Tracking-By-Detection (TBD) optimizes the two subtasks individually, and the single-stage Joint Detection and Tracking (JDT) adjusts the complex network architectures finely in an end-to-end pipeline. In this paper, we propose a new MOT method, i.e., Proposal Propagation via Diffusion Models, called Pro2Diff, which integrates a diffusion model into the proposal propagation in multi-object tracking, focusing on the model training process rather than complex network design. Specifically, using a generative approach, Pro2Diff generates a considerable number of noisy proposals for the tracking image sequence in the forward process, and subsequently, Pro2Diff learns the discrepancies between these noisy proposals and the actual bounding boxes of the tracked objects, gradually optimizing these noisy proposals to obtain the initial sequence of real tracked objects. By introducing the denoising diffusion process into multi-object tracking, we have made three further important findings: 1) Generative methods can effectively handle multi-object tracking tasks; 2) Without the need to modify the model structure, we propose self-conditional proposal propagation to enhance model performance effectively during inference; 3) By adjusting the numbers of proposals and iterations appropriately for different tracking sequences, the optimal performance of the model can be achieved. Extensive experimental results on MOT17 and DanceTrack datasets demonstrate that Pro2Diff outperforms current end-to-end multi-object tracking methods. We achieve 61.9 HOTA on DanceTrack and 57.6 HOTA on MOT17, reaching the competitive result of the JDT approach.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6508-6520"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10753449/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-object tracking (MOT) aims to estimate the bounding boxes and ID labels of objects in videos. The challenging issue in this task is to alleviate competitive learning between the detection and tracking subtasks, for which, two-stage Tracking-By-Detection (TBD) optimizes the two subtasks individually, and the single-stage Joint Detection and Tracking (JDT) adjusts the complex network architectures finely in an end-to-end pipeline. In this paper, we propose a new MOT method, i.e., Proposal Propagation via Diffusion Models, called Pro2Diff, which integrates a diffusion model into the proposal propagation in multi-object tracking, focusing on the model training process rather than complex network design. Specifically, using a generative approach, Pro2Diff generates a considerable number of noisy proposals for the tracking image sequence in the forward process, and subsequently, Pro2Diff learns the discrepancies between these noisy proposals and the actual bounding boxes of the tracked objects, gradually optimizing these noisy proposals to obtain the initial sequence of real tracked objects. By introducing the denoising diffusion process into multi-object tracking, we have made three further important findings: 1) Generative methods can effectively handle multi-object tracking tasks; 2) Without the need to modify the model structure, we propose self-conditional proposal propagation to enhance model performance effectively during inference; 3) By adjusting the numbers of proposals and iterations appropriately for different tracking sequences, the optimal performance of the model can be achieved. Extensive experimental results on MOT17 and DanceTrack datasets demonstrate that Pro2Diff outperforms current end-to-end multi-object tracking methods. We achieve 61.9 HOTA on DanceTrack and 57.6 HOTA on MOT17, reaching the competitive result of the JDT approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Pro2Diff:通过扩散模型进行多目标跟踪的提案传播
多目标跟踪(MOT)旨在估计视频中物体的边界框和 ID 标签。在这项任务中,具有挑战性的问题是如何缓解检测和跟踪子任务之间的竞争性学习,为此,两阶段跟踪检测(Tracking-By-Detection,TBD)分别对这两个子任务进行优化,而单阶段联合检测和跟踪(Joint Detection and Tracking,JDT)则在端到端流水线中对复杂的网络架构进行精细调整。在本文中,我们提出了一种新的 MOT 方法,即通过扩散模型进行提议传播(Proposal Propagation via Diffusion Models),称为 Pro2Diff,它将扩散模型集成到多目标跟踪的提议传播中,重点关注模型训练过程而非复杂的网络设计。具体来说,Pro2Diff 采用生成式方法,在前向过程中为跟踪图像序列生成相当数量的噪声提议,随后,Pro2Diff 学习这些噪声提议与实际跟踪对象边界框之间的差异,逐步优化这些噪声提议,从而获得真实跟踪对象的初始序列。通过在多目标跟踪中引入去噪扩散过程,我们又有了三个重要发现:1)生成式方法可以有效地处理多目标跟踪任务;2)无需修改模型结构,我们提出了自条件提案传播法,可以在推理过程中有效地提高模型性能;3)通过针对不同的跟踪序列适当调整提案数和迭代数,可以实现模型的最佳性能。在 MOT17 和 DanceTrack 数据集上的大量实验结果表明,Pro2Diff 优于目前的端到端多目标跟踪方法。我们在 DanceTrack 上获得了 61.9 HOTA,在 MOT17 上获得了 57.6 HOTA,达到了 JDT 方法的竞争结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects Breaking Boundaries: Unifying Imaging and Compression for HDR Image Compression A Pyramid Fusion MLP for Dense Prediction IFENet: Interaction, Fusion, and Enhancement Network for V-D-T Salient Object Detection NeuralDiffuser: Neuroscience-Inspired Diffusion Guidance for fMRI Visual Reconstruction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1