Motion estimation and multi-stage association for tracking-by-detection

IF 5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2023-11-22 DOI:10.1007/s40747-023-01273-3
Ye Li, Lei Wu, Yiping Chen, Xinzhong Wang, Guangqiang Yin, Zhiguo Wang
{"title":"Motion estimation and multi-stage association for tracking-by-detection","authors":"Ye Li, Lei Wu, Yiping Chen, Xinzhong Wang, Guangqiang Yin, Zhiguo Wang","doi":"10.1007/s40747-023-01273-3","DOIUrl":null,"url":null,"abstract":"<p>Multi-object tracking (MOT) aims to locate and identify objects in videos. As deep learning brings excellent performances to object detection, the tracking-by-detection (TBD) has gradually become a mainstream tracking framework. However, some drawbacks still exist in the current TBD framework: (1) inaccurate prediction of the bounding boxes would occur in the detection part, which is caused by overlooking the actual pedestrian ratio in the surveillance scene. (2) The width of the bounding boxes in the next frame might be indirectly predicted by the aspect ratio, which increases the error of width prediction in the motion prediction part. (3) Association is only performed for high-confidence detection boxes, and the low-confidence boxes caused by occlusion are discarded in the data association part, resulting in fragmentation of trajectories. To address the above issues, we propose a multi-target tracking model incorporating motion estimation and multi-stage association (MEMA). First, the aspect ratio of the ground-true bounding box is introduced to improve the fit of the detection and the ground-true bounding box, and we design the elliptical Gaussian kernel to improve the positioning accuracy of the object center point. Then, the prediction state vector of the Kalman filter is modified to predict the width and its corresponding velocity directly. It can reduce the width error of the prediction box and eliminate the velocity error of the motion estimation, which leads to a more pedestrian-friendly prediction bounding box. Finally, we propose a multi-stage association strategy to correlate different confidence boxes. Without using the appearance feature, the strategy can reduce the impact of occlusion and improve the tracking performance. On the MOT17 test set, the method proposed in this paper achieves a MOTA of 74.3% and an IDF1 of 72.4%, outperforming the current SOTA.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"17 10","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-023-01273-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-object tracking (MOT) aims to locate and identify objects in videos. As deep learning brings excellent performances to object detection, the tracking-by-detection (TBD) has gradually become a mainstream tracking framework. However, some drawbacks still exist in the current TBD framework: (1) inaccurate prediction of the bounding boxes would occur in the detection part, which is caused by overlooking the actual pedestrian ratio in the surveillance scene. (2) The width of the bounding boxes in the next frame might be indirectly predicted by the aspect ratio, which increases the error of width prediction in the motion prediction part. (3) Association is only performed for high-confidence detection boxes, and the low-confidence boxes caused by occlusion are discarded in the data association part, resulting in fragmentation of trajectories. To address the above issues, we propose a multi-target tracking model incorporating motion estimation and multi-stage association (MEMA). First, the aspect ratio of the ground-true bounding box is introduced to improve the fit of the detection and the ground-true bounding box, and we design the elliptical Gaussian kernel to improve the positioning accuracy of the object center point. Then, the prediction state vector of the Kalman filter is modified to predict the width and its corresponding velocity directly. It can reduce the width error of the prediction box and eliminate the velocity error of the motion estimation, which leads to a more pedestrian-friendly prediction bounding box. Finally, we propose a multi-stage association strategy to correlate different confidence boxes. Without using the appearance feature, the strategy can reduce the impact of occlusion and improve the tracking performance. On the MOT17 test set, the method proposed in this paper achieves a MOTA of 74.3% and an IDF1 of 72.4%, outperforming the current SOTA.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于检测跟踪的运动估计和多阶段关联
多目标跟踪(MOT)旨在对视频中的目标进行定位和识别。由于深度学习为目标检测带来了优异的性能,跟踪检测(tracking-by-detection, TBD)逐渐成为主流的跟踪框架。然而,目前的TBD框架仍然存在一些缺陷:(1)检测部分会出现边界框预测不准确的情况,这是由于忽略了监控场景中实际的行人比例。(2)下一帧边界框的宽度可以通过纵横比间接预测,增加了运动预测部分宽度预测的误差。(3)仅对高置信度检测盒进行关联,在数据关联部分丢弃了遮挡导致的低置信度检测盒,导致轨迹碎片化。为了解决上述问题,我们提出了一种结合运动估计和多阶段关联(MEMA)的多目标跟踪模型。首先,引入地真边界框的纵横比来提高检测与地真边界框的拟合性,设计椭圆高斯核来提高目标中心点的定位精度;然后,对卡尔曼滤波器的预测状态向量进行修正,直接预测宽度及其对应的速度;它可以减小预测框的宽度误差,消除运动估计的速度误差,从而得到更适合行人的预测框。最后,我们提出了一种多阶段关联策略来关联不同的置信盒。在不使用外观特征的情况下,该策略可以减少遮挡的影响,提高跟踪性能。在MOT17测试集上,本文方法的MOTA为74.3%,IDF1为72.4%,优于现有的SOTA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Complex & Intelligent Systems
Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
9.60
自引率
10.30%
发文量
297
期刊介绍: Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.
期刊最新文献
Large-scale multiobjective competitive swarm optimizer algorithm based on regional multidirectional search Towards fairness-aware multi-objective optimization Low-frequency spectral graph convolution networks with one-hop connections information for personalized tag recommendation A decentralized feedback-based consensus model considering the consistency maintenance and readability of probabilistic linguistic preference relations for large-scale group decision-making A dynamic preference recommendation model based on spatiotemporal knowledge graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1