Yubin Yuan;Yiquan Wu;Langyue Zhao;Yuqi Liu;Yaxuan Pang
{"title":"TLSH-MOT: Drone-View Video Multiple Object Tracking via Transformer-Based Locally Sensitive Hash","authors":"Yubin Yuan;Yiquan Wu;Langyue Zhao;Yuqi Liu;Yaxuan Pang","doi":"10.1109/TGRS.2025.3545081","DOIUrl":null,"url":null,"abstract":"Multiple object tracking (MOT) plays an essential role in drone-view remote sensing applications, such as urban management, emergency rescue, and maritime monitoring. However, due to large variations in object scale and position, the frequent feature loss across frames, and difficulties in matching, traditional methods struggle to achieve high-tracking accuracy in such challenging environments. To address these issues, we propose a Transformer-based locally sensitive hash MOT (TLSH-MOT) method in drone-view remote sensing scenarios. First, a frame-level feature extraction and enhancement module is introduced, integrating a nominee proposal generation (NPG) unit and a tilt convolutional vision Transformer (ViT), which enables adaptive detection of objects across varying scales and perspectives. Next, a spatiotemporal memory (STM) structure is designed to mitigate instantaneous environmental interference and periodic changes using short-term and long-term memory blocks, thereby enhancing tracking stability under complex meteorological conditions. In addition, a temporal enhancement feature decoder (TEFD) fuses multisource feature information to better understand the motion patterns of remote sensing objects. Finally, a local sensitive hash (LSH) IDLinker ensures efficient feature matching, significantly improving trajectory association in large-scale monitoring scenarios. Experimental results show that TLSH-MOT achieves MOT accuracy of 40.7% and 62.2% on VisDrone and UAVDT datasets, respectively, which verifies the superiority of TLSH-MOT in the remote sensing video tracking field. The framework’s code is released at: <uri>https://github.com/YubinYuan/TLSH-MOT</uri>.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-16"},"PeriodicalIF":8.6000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10902600/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Multiple object tracking (MOT) plays an essential role in drone-view remote sensing applications, such as urban management, emergency rescue, and maritime monitoring. However, due to large variations in object scale and position, the frequent feature loss across frames, and difficulties in matching, traditional methods struggle to achieve high-tracking accuracy in such challenging environments. To address these issues, we propose a Transformer-based locally sensitive hash MOT (TLSH-MOT) method in drone-view remote sensing scenarios. First, a frame-level feature extraction and enhancement module is introduced, integrating a nominee proposal generation (NPG) unit and a tilt convolutional vision Transformer (ViT), which enables adaptive detection of objects across varying scales and perspectives. Next, a spatiotemporal memory (STM) structure is designed to mitigate instantaneous environmental interference and periodic changes using short-term and long-term memory blocks, thereby enhancing tracking stability under complex meteorological conditions. In addition, a temporal enhancement feature decoder (TEFD) fuses multisource feature information to better understand the motion patterns of remote sensing objects. Finally, a local sensitive hash (LSH) IDLinker ensures efficient feature matching, significantly improving trajectory association in large-scale monitoring scenarios. Experimental results show that TLSH-MOT achieves MOT accuracy of 40.7% and 62.2% on VisDrone and UAVDT datasets, respectively, which verifies the superiority of TLSH-MOT in the remote sensing video tracking field. The framework’s code is released at: https://github.com/YubinYuan/TLSH-MOT.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.