STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-17 DOI:arxiv-2409.11234

Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu

{"title":"STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking","authors":"Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu","doi":"arxiv-2409.11234","DOIUrl":null,"url":null,"abstract":"Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is\nimportant for diverse applications in computer vision. Current MOT trackers\nrely on accurate object detection results and precise matching of target\nreidentification (ReID). These methods focus on optimizing target spatial\nattributes while overlooking temporal cues in modelling object relationships,\nespecially for challenging tracking conditions such as object deformation and\nblurring, etc. To address the above-mentioned issues, we propose a novel\nSpatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which\nutilizes historical embedding features to model the representation of ReID and\ndetection features in a sequential order. Concretely, a temporal embedding\nboosting module is introduced to enhance the discriminability of individual\nembedding based on adjacent frame cooperation. While the trajectory embedding\nis then propagated by a temporal detection refinement module to mine salient\ntarget locations in the temporal field. Extensive experiments on the\nVisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new\nstate-of-the-art performance in MOTA and IDF1 metrics. The source codes are\nreleased at https://github.com/ydhcg-BoBo/STCMOT.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challenging tracking conditions such as object deformation and blurring, etc. To address the above-mentioned issues, we propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order. Concretely, a temporal embedding boosting module is introduced to enhance the discriminability of individual embedding based on adjacent frame cooperation. While the trajectory embedding is then propagated by a temporal detection refinement module to mine salient target locations in the temporal field. Extensive experiments on the VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new state-of-the-art performance in MOTA and IDF1 metrics. The source codes are released at https://github.com/ydhcg-BoBo/STCMOT.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

STCMOT：基于无人机的多目标跟踪时空聚合学习

无人飞行器（UAV）视频中的多目标跟踪（MOT）对于计算机视觉领域的各种应用都非常重要。当前的多目标跟踪器依赖于精确的目标检测结果和目标识别（ReID）的精确匹配。这些方法侧重于优化目标的空间属性，而忽略了在模拟物体关系时的时间线索，尤其是在物体变形和模糊等具有挑战性的跟踪条件下。为了解决上述问题，我们提出了一种新颖的空间-时间内聚多目标跟踪框架（STCMOT），它利用历史嵌入特征来模拟按顺序表示的 ReID 和检测特征。具体来说，引入了一个时间嵌入增强模块，以增强基于相邻帧合作的单个嵌入的可辨别性。而轨迹嵌入则由时序检测细化模块传播，以挖掘时域中的咸目标位置。在 VisDrone2019 和 UAVDT 数据集上进行的大量实验表明，我们的 STCMOT 在 MOTA 和 IDF1 指标上达到了最先进的性能。源代码发布于 https://github.com/ydhcg-BoBo/STCMOT。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey