利用预测轨迹点和物体之间的分数驱动分层关联策略进行多物体跟踪

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-10-19 DOI:10.1016/j.imavis.2024.105303

Tianyi Zhao, Guanci Yang, Yang Li, Minglang Lu, Haoran Sun

{"title":"利用预测轨迹点和物体之间的分数驱动分层关联策略进行多物体跟踪","authors":"Tianyi Zhao, Guanci Yang, Yang Li, Minglang Lu, Haoran Sun","doi":"10.1016/j.imavis.2024.105303","DOIUrl":null,"url":null,"abstract":"<div><div>Machine vision is one of the major technologies to guarantee intelligent robots’ human-centered embodied intelligence. Especially in the complex dynamic scene involving multi-person, Multi-Object Tracking (MOT), which can accurately identify and track specific targets, significantly influences intelligent robots’ performance regarding behavior perception and monitoring, autonomous decision-making, and providing personalized humanoid services. In order to solve the problem of targets lost and identity switches caused by the scale variations of objects and frequent overlaps during the tracking process, this paper presents a multi-object tracking method using score-driven hierarchical association strategy between predicted tracklets and objects (ScoreMOT). Firstly, a motion prediction of occluded objects based on bounding box variation (MPOBV) is proposed to estimate the position of occluded objects. MPOBV models the motion state of the object using the bounding box and confidence score. Then, a score-driven hierarchical association strategy between predicted tracklets and objects (SHAS) is proposed to correctly associate them in frequently overlapping scenarios. SHAS associates the predicted tracklets and detected objects with different confidence in different stages. The comparison results with 16 state-of-the-art methods on Multiple Object Tracking Benchmark 20 (MOT20) and DanceTrack datasets are conducted, and ScoreMOT outperforms the compared methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105303"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-object tracking using score-driven hierarchical association strategy between predicted tracklets and objects\",\"authors\":\"Tianyi Zhao, Guanci Yang, Yang Li, Minglang Lu, Haoran Sun\",\"doi\":\"10.1016/j.imavis.2024.105303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Machine vision is one of the major technologies to guarantee intelligent robots’ human-centered embodied intelligence. Especially in the complex dynamic scene involving multi-person, Multi-Object Tracking (MOT), which can accurately identify and track specific targets, significantly influences intelligent robots’ performance regarding behavior perception and monitoring, autonomous decision-making, and providing personalized humanoid services. In order to solve the problem of targets lost and identity switches caused by the scale variations of objects and frequent overlaps during the tracking process, this paper presents a multi-object tracking method using score-driven hierarchical association strategy between predicted tracklets and objects (ScoreMOT). Firstly, a motion prediction of occluded objects based on bounding box variation (MPOBV) is proposed to estimate the position of occluded objects. MPOBV models the motion state of the object using the bounding box and confidence score. Then, a score-driven hierarchical association strategy between predicted tracklets and objects (SHAS) is proposed to correctly associate them in frequently overlapping scenarios. SHAS associates the predicted tracklets and detected objects with different confidence in different stages. The comparison results with 16 state-of-the-art methods on Multiple Object Tracking Benchmark 20 (MOT20) and DanceTrack datasets are conducted, and ScoreMOT outperforms the compared methods.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"152 \",\"pages\":\"Article 105303\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624004086\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004086","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

机器视觉是保证智能机器人以人为本的智能体现的主要技术之一。特别是在多人参与的复杂动态场景中，能够准确识别和跟踪特定目标的多目标跟踪（MOT）技术对智能机器人的行为感知与监控、自主决策以及提供个性化仿人服务等性能有着重要影响。为了解决跟踪过程中因物体尺度变化和频繁重叠而导致的目标丢失和身份转换问题，本文提出了一种多目标跟踪方法，该方法采用预测轨迹子和物体之间的分数驱动分层关联策略（ScoreMOT）。首先，本文提出了一种基于边界框变化的隐蔽物体运动预测（MPOBV）来估计隐蔽物体的位置。MPOBV 利用边界框和置信度分数对物体的运动状态进行建模。然后，提出了预测轨迹点和物体之间的分数驱动分层关联策略（SHAS），以便在频繁重叠的场景中正确关联它们。SHAS 在不同阶段以不同的置信度关联预测的小轨迹和检测到的物体。在多目标跟踪基准 20（MOT20）和 DanceTrack 数据集上与 16 种最先进的方法进行了比较，结果表明 ScoreMOT 优于所比较的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-object tracking using score-driven hierarchical association strategy between predicted tracklets and objects

Machine vision is one of the major technologies to guarantee intelligent robots’ human-centered embodied intelligence. Especially in the complex dynamic scene involving multi-person, Multi-Object Tracking (MOT), which can accurately identify and track specific targets, significantly influences intelligent robots’ performance regarding behavior perception and monitoring, autonomous decision-making, and providing personalized humanoid services. In order to solve the problem of targets lost and identity switches caused by the scale variations of objects and frequent overlaps during the tracking process, this paper presents a multi-object tracking method using score-driven hierarchical association strategy between predicted tracklets and objects (ScoreMOT). Firstly, a motion prediction of occluded objects based on bounding box variation (MPOBV) is proposed to estimate the position of occluded objects. MPOBV models the motion state of the object using the bounding box and confidence score. Then, a score-driven hierarchical association strategy between predicted tracklets and objects (SHAS) is proposed to correctly associate them in frequently overlapping scenarios. SHAS associates the predicted tracklets and detected objects with different confidence in different stages. The comparison results with 16 state-of-the-art methods on Multiple Object Tracking Benchmark 20 (MOT20) and DanceTrack datasets are conducted, and ScoreMOT outperforms the compared methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.