LV2DMOT: Language and Visual Multimodal Feature Learning for Multiobject Tracking

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Sensors Journal Pub Date : 2025-01-07 DOI:10.1109/JSEN.2024.3519903

Ru Hong;Zeyu Cai;Jiming Yang;Feipeng Da

{"title":"LV2DMOT: Language and Visual Multimodal Feature Learning for Multiobject Tracking","authors":"Ru Hong;Zeyu Cai;Jiming Yang;Feipeng Da","doi":"10.1109/JSEN.2024.3519903","DOIUrl":null,"url":null,"abstract":"Multiobject tracking (MOT) aims to associate objects of the same identity across video frames, with robust similarity measurement being crucial for maintaining tracking performance. However, the current inefficient integration of motion and appearance cues often leads to tracking failures in challenging scenarios, such as occlusions and missed detections. In this article, we introduce LV2DMOT, a tracker that employs a novel paradigm for integrating motion and appearance cues through language and visual multimodal feature learning, thereby generating more distinctive data association similarities. We propose three key techniques: 1) a text-matching task between tracking trajectories and candidate detections. This method uses text encoding of detection geometric information combined with a temporal model, Mamba, to extract temporal motion features of trajectories, enabling more accurate motion similarity calculations; 2) a multimodal, multilevel feature fusion model that integrates motion and appearance features via a cross-modal learning mechanism, resulting in more robust fused similarities; and 3) a learnable temporal attention model for trajectory appearance feature updates, which effectively aggregates historical visual features to improve the representational ability of trajectory appearance features, employing k-medoids for feature selection. Extensive experiments on the MOT17 and MOT20 datasets demonstrate that our method achieves state-of-the-art (SOTA) tracking performance.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 4","pages":"7482-7495"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10832530/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Multiobject tracking (MOT) aims to associate objects of the same identity across video frames, with robust similarity measurement being crucial for maintaining tracking performance. However, the current inefficient integration of motion and appearance cues often leads to tracking failures in challenging scenarios, such as occlusions and missed detections. In this article, we introduce LV2DMOT, a tracker that employs a novel paradigm for integrating motion and appearance cues through language and visual multimodal feature learning, thereby generating more distinctive data association similarities. We propose three key techniques: 1) a text-matching task between tracking trajectories and candidate detections. This method uses text encoding of detection geometric information combined with a temporal model, Mamba, to extract temporal motion features of trajectories, enabling more accurate motion similarity calculations; 2) a multimodal, multilevel feature fusion model that integrates motion and appearance features via a cross-modal learning mechanism, resulting in more robust fused similarities; and 3) a learnable temporal attention model for trajectory appearance feature updates, which effectively aggregates historical visual features to improve the representational ability of trajectory appearance features, employing k-medoids for feature selection. Extensive experiments on the MOT17 and MOT20 datasets demonstrate that our method achieves state-of-the-art (SOTA) tracking performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice

期刊最新文献

Front Cover Table of Contents IEEE Sensors Journal Publication Information IEEE Sensors Council 2024 Index IEEE Sensors Journal Vol. 24