LV2DMOT: Language and Visual Multimodal Feature Learning for Multiobject Tracking

IF 4.3 2区 综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Sensors Journal Pub Date : 2025-01-07 DOI:10.1109/JSEN.2024.3519903
Ru Hong;Zeyu Cai;Jiming Yang;Feipeng Da
{"title":"LV2DMOT: Language and Visual Multimodal Feature Learning for Multiobject Tracking","authors":"Ru Hong;Zeyu Cai;Jiming Yang;Feipeng Da","doi":"10.1109/JSEN.2024.3519903","DOIUrl":null,"url":null,"abstract":"Multiobject tracking (MOT) aims to associate objects of the same identity across video frames, with robust similarity measurement being crucial for maintaining tracking performance. However, the current inefficient integration of motion and appearance cues often leads to tracking failures in challenging scenarios, such as occlusions and missed detections. In this article, we introduce LV2DMOT, a tracker that employs a novel paradigm for integrating motion and appearance cues through language and visual multimodal feature learning, thereby generating more distinctive data association similarities. We propose three key techniques: 1) a text-matching task between tracking trajectories and candidate detections. This method uses text encoding of detection geometric information combined with a temporal model, Mamba, to extract temporal motion features of trajectories, enabling more accurate motion similarity calculations; 2) a multimodal, multilevel feature fusion model that integrates motion and appearance features via a cross-modal learning mechanism, resulting in more robust fused similarities; and 3) a learnable temporal attention model for trajectory appearance feature updates, which effectively aggregates historical visual features to improve the representational ability of trajectory appearance features, employing k-medoids for feature selection. Extensive experiments on the MOT17 and MOT20 datasets demonstrate that our method achieves state-of-the-art (SOTA) tracking performance.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 4","pages":"7482-7495"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10832530/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Multiobject tracking (MOT) aims to associate objects of the same identity across video frames, with robust similarity measurement being crucial for maintaining tracking performance. However, the current inefficient integration of motion and appearance cues often leads to tracking failures in challenging scenarios, such as occlusions and missed detections. In this article, we introduce LV2DMOT, a tracker that employs a novel paradigm for integrating motion and appearance cues through language and visual multimodal feature learning, thereby generating more distinctive data association similarities. We propose three key techniques: 1) a text-matching task between tracking trajectories and candidate detections. This method uses text encoding of detection geometric information combined with a temporal model, Mamba, to extract temporal motion features of trajectories, enabling more accurate motion similarity calculations; 2) a multimodal, multilevel feature fusion model that integrates motion and appearance features via a cross-modal learning mechanism, resulting in more robust fused similarities; and 3) a learnable temporal attention model for trajectory appearance feature updates, which effectively aggregates historical visual features to improve the representational ability of trajectory appearance features, employing k-medoids for feature selection. Extensive experiments on the MOT17 and MOT20 datasets demonstrate that our method achieves state-of-the-art (SOTA) tracking performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LV2DMOT:多目标跟踪的语言和视觉多模态特征学习
多目标跟踪(MOT)旨在跨视频帧关联相同身份的对象,而鲁棒的相似性测量对于保持跟踪性能至关重要。然而,目前运动和外观线索的低效整合经常导致在具有挑战性的情况下跟踪失败,例如闭塞和遗漏检测。在本文中,我们介绍了LV2DMOT,这是一个跟踪器,它采用了一种新的范式,通过语言和视觉多模态特征学习来整合运动和外观线索,从而产生更独特的数据关联相似性。我们提出了三个关键技术:1)跟踪轨迹和候选检测之间的文本匹配任务。该方法利用检测几何信息的文本编码,结合时间模型Mamba提取轨迹的时间运动特征,实现更精确的运动相似度计算;2)一个多模态、多层次的特征融合模型,通过跨模态学习机制整合运动和外观特征,产生更鲁棒的融合相似度;3)可学习的轨迹外观特征更新时间注意模型,利用k-介质进行特征选择,有效聚合历史视觉特征,提高轨迹外观特征的表征能力。在MOT17和MOT20数据集上的大量实验表明,我们的方法达到了最先进的(SOTA)跟踪性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Sensors Journal
IEEE Sensors Journal 工程技术-工程:电子与电气
CiteScore
7.70
自引率
14.00%
发文量
2058
审稿时长
5.2 months
期刊介绍: The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice
期刊最新文献
2025 Index IEEE Sensors Journal IEEE Sensors Council IEEE Sensors Council IEEE Sensors Council Stagewise Optimization Framework for Fall Direction Recognition From Wearable Sensor Data Based on Machine Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1