Motion estimation and multi-stage association for tracking-by-detection

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2023-11-22 DOI:10.1007/s40747-023-01273-3

Ye Li, Lei Wu, Yiping Chen, Xinzhong Wang, Guangqiang Yin, Zhiguo Wang

{"title":"Motion estimation and multi-stage association for tracking-by-detection","authors":"Ye Li, Lei Wu, Yiping Chen, Xinzhong Wang, Guangqiang Yin, Zhiguo Wang","doi":"10.1007/s40747-023-01273-3","DOIUrl":null,"url":null,"abstract":"<p>Multi-object tracking (MOT) aims to locate and identify objects in videos. As deep learning brings excellent performances to object detection, the tracking-by-detection (TBD) has gradually become a mainstream tracking framework. However, some drawbacks still exist in the current TBD framework: (1) inaccurate prediction of the bounding boxes would occur in the detection part, which is caused by overlooking the actual pedestrian ratio in the surveillance scene. (2) The width of the bounding boxes in the next frame might be indirectly predicted by the aspect ratio, which increases the error of width prediction in the motion prediction part. (3) Association is only performed for high-confidence detection boxes, and the low-confidence boxes caused by occlusion are discarded in the data association part, resulting in fragmentation of trajectories. To address the above issues, we propose a multi-target tracking model incorporating motion estimation and multi-stage association (MEMA). First, the aspect ratio of the ground-true bounding box is introduced to improve the fit of the detection and the ground-true bounding box, and we design the elliptical Gaussian kernel to improve the positioning accuracy of the object center point. Then, the prediction state vector of the Kalman filter is modified to predict the width and its corresponding velocity directly. It can reduce the width error of the prediction box and eliminate the velocity error of the motion estimation, which leads to a more pedestrian-friendly prediction bounding box. Finally, we propose a multi-stage association strategy to correlate different confidence boxes. Without using the appearance feature, the strategy can reduce the impact of occlusion and improve the tracking performance. On the MOT17 test set, the method proposed in this paper achieves a MOTA of 74.3% and an IDF1 of 72.4%, outperforming the current SOTA.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"17 10","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-023-01273-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-object tracking (MOT) aims to locate and identify objects in videos. As deep learning brings excellent performances to object detection, the tracking-by-detection (TBD) has gradually become a mainstream tracking framework. However, some drawbacks still exist in the current TBD framework: (1) inaccurate prediction of the bounding boxes would occur in the detection part, which is caused by overlooking the actual pedestrian ratio in the surveillance scene. (2) The width of the bounding boxes in the next frame might be indirectly predicted by the aspect ratio, which increases the error of width prediction in the motion prediction part. (3) Association is only performed for high-confidence detection boxes, and the low-confidence boxes caused by occlusion are discarded in the data association part, resulting in fragmentation of trajectories. To address the above issues, we propose a multi-target tracking model incorporating motion estimation and multi-stage association (MEMA). First, the aspect ratio of the ground-true bounding box is introduced to improve the fit of the detection and the ground-true bounding box, and we design the elliptical Gaussian kernel to improve the positioning accuracy of the object center point. Then, the prediction state vector of the Kalman filter is modified to predict the width and its corresponding velocity directly. It can reduce the width error of the prediction box and eliminate the velocity error of the motion estimation, which leads to a more pedestrian-friendly prediction bounding box. Finally, we propose a multi-stage association strategy to correlate different confidence boxes. Without using the appearance feature, the strategy can reduce the impact of occlusion and improve the tracking performance. On the MOT17 test set, the method proposed in this paper achieves a MOTA of 74.3% and an IDF1 of 72.4%, outperforming the current SOTA.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于检测跟踪的运动估计和多阶段关联

多目标跟踪(MOT)旨在对视频中的目标进行定位和识别。由于深度学习为目标检测带来了优异的性能，跟踪检测(tracking-by-detection, TBD)逐渐成为主流的跟踪框架。然而，目前的TBD框架仍然存在一些缺陷:(1)检测部分会出现边界框预测不准确的情况，这是由于忽略了监控场景中实际的行人比例。(2)下一帧边界框的宽度可以通过纵横比间接预测，增加了运动预测部分宽度预测的误差。(3)仅对高置信度检测盒进行关联，在数据关联部分丢弃了遮挡导致的低置信度检测盒，导致轨迹碎片化。为了解决上述问题，我们提出了一种结合运动估计和多阶段关联(MEMA)的多目标跟踪模型。首先，引入地真边界框的纵横比来提高检测与地真边界框的拟合性，设计椭圆高斯核来提高目标中心点的定位精度;然后，对卡尔曼滤波器的预测状态向量进行修正，直接预测宽度及其对应的速度;它可以减小预测框的宽度误差，消除运动估计的速度误差，从而得到更适合行人的预测框。最后，我们提出了一种多阶段关联策略来关联不同的置信盒。在不使用外观特征的情况下，该策略可以减少遮挡的影响，提高跟踪性能。在MOT17测试集上，本文方法的MOTA为74.3%，IDF1为72.4%，优于现有的SOTA。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.