Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects

Yanwei Zheng;Bowen Huang;Zekai Chen;Dongxiao Yu
{"title":"Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects","authors":"Yanwei Zheng;Bowen Huang;Zekai Chen;Dongxiao Yu","doi":"10.1109/TIP.2025.3527369","DOIUrl":null,"url":null,"abstract":"Text-video retrieval aims to establish a matching relationship between a video and its corresponding text. However, previous works have primarily focused on salient video subjects, such as humans or animals, often overlooking Low-Salient but Discriminative Objects (LSDOs) that play a critical role in understanding content. To address this limitation, we propose a novel model that enhances retrieval performance by emphasizing these overlooked elements across video and text modalities. In the video modality, our model first incorporates a feature selection module to gather video-level LSDO features, and applies cross-modal attention to assign frame-specific weights based on relevance, yielding frame-level LSDO features. In the text modality, text-level LSDO features are captured by generating multiple object prototypes in a sparse aggregation manner. Extensive experiments on benchmark datasets, including MSR-VTT, MSVD, LSMDC, and DiDeMo, demonstrate that our model achieves state-of-the-art results across various evaluation metrics.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"581-593"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10841928/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text-video retrieval aims to establish a matching relationship between a video and its corresponding text. However, previous works have primarily focused on salient video subjects, such as humans or animals, often overlooking Low-Salient but Discriminative Objects (LSDOs) that play a critical role in understanding content. To address this limitation, we propose a novel model that enhances retrieval performance by emphasizing these overlooked elements across video and text modalities. In the video modality, our model first incorporates a feature selection module to gather video-level LSDO features, and applies cross-modal attention to assign frame-specific weights based on relevance, yielding frame-level LSDO features. In the text modality, text-level LSDO features are captured by generating multiple object prototypes in a sparse aggregation manner. Extensive experiments on benchmark datasets, including MSR-VTT, MSVD, LSMDC, and DiDeMo, demonstrate that our model achieves state-of-the-art results across various evaluation metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于低显著性但有区别目标的文本视频检索性能增强
文本-视频检索的目的是在视频和相应的文本之间建立匹配关系。然而,以前的作品主要集中在突出的视频主题上,如人类或动物,往往忽略了在理解内容中起关键作用的低突出但判别对象(ldos)。为了解决这一限制,我们提出了一个新的模型,通过强调视频和文本模式中这些被忽视的元素来提高检索性能。在视频模态中,我们的模型首先集成了一个特征选择模块来收集视频级LSDO特征,并应用跨模态关注来根据相关性分配特定帧的权重,从而产生帧级LSDO特征。在文本模式中,通过稀疏聚合方式生成多个对象原型来捕获文本级LSDO特征。在包括MSR-VTT、MSVD、LSMDC和DiDeMo在内的基准数据集上进行的大量实验表明,我们的模型在各种评估指标上取得了最先进的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhancing Text-Video Retrieval Performance With Low-Salient but Discriminative Objects Breaking Boundaries: Unifying Imaging and Compression for HDR Image Compression A Pyramid Fusion MLP for Dense Prediction IFENet: Interaction, Fusion, and Enhancement Network for V-D-T Salient Object Detection NeuralDiffuser: Neuroscience-Inspired Diffusion Guidance for fMRI Visual Reconstruction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1