Visual Selection and Multistage Reasoning for RSVG

Yueli Ding;Haojie Xu;Di Wang;Ke Li;Yumin Tian
{"title":"Visual Selection and Multistage Reasoning for RSVG","authors":"Yueli Ding;Haojie Xu;Di Wang;Ke Li;Yumin Tian","doi":"10.1109/LGRS.2024.3386311","DOIUrl":null,"url":null,"abstract":"Visual grounding of remote sensing (RSVG) is a task to locate targets indicated by referring expressions in remote sensing (RS) images. Previous approaches directly concatenate visual and language features and stack a series of transformer encoders for cross-modal fusion. However, this fusion strategy fails to fully leverage attributes and contextual information of the targets in referring expressions, limiting the performance of the existing methods. To address this issue, we propose a novel visual grounding framework for RSVG, named VSMR, which achieves accurate localization by adaptively selecting target-relevant features and performing multistage cross-modal reasoning. Specifically, we propose an adaptive feature selection (AFS) module, which automatically selects visual features relevant to queries while suppressing background noises. A multistage decoder (MSD) is designed to iteratively infer correlations between images and queries by leveraging abundant object attributes and contextual information in the referring expressions, thereby achieving accurate target localization. Experiments demonstrate that our method is superior to other state-of-the-art (SoTA) methods, achieving an accuracy of 78.24%.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"21 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10494585/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Visual grounding of remote sensing (RSVG) is a task to locate targets indicated by referring expressions in remote sensing (RS) images. Previous approaches directly concatenate visual and language features and stack a series of transformer encoders for cross-modal fusion. However, this fusion strategy fails to fully leverage attributes and contextual information of the targets in referring expressions, limiting the performance of the existing methods. To address this issue, we propose a novel visual grounding framework for RSVG, named VSMR, which achieves accurate localization by adaptively selecting target-relevant features and performing multistage cross-modal reasoning. Specifically, we propose an adaptive feature selection (AFS) module, which automatically selects visual features relevant to queries while suppressing background noises. A multistage decoder (MSD) is designed to iteratively infer correlations between images and queries by leveraging abundant object attributes and contextual information in the referring expressions, thereby achieving accurate target localization. Experiments demonstrate that our method is superior to other state-of-the-art (SoTA) methods, achieving an accuracy of 78.24%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
RSVG 的视觉选择和多阶段推理
遥感视觉定位(RSVG)是一项在遥感(RS)图像中定位指代表达所指示目标的任务。以往的方法是直接将视觉和语言特征串联起来,并堆叠一系列转换编码器进行跨模态融合。然而,这种融合策略无法充分利用指代表达中目标的属性和上下文信息,从而限制了现有方法的性能。为了解决这个问题,我们提出了一种新颖的 RSVG 视觉定位框架,名为 VSMR,它通过自适应选择目标相关特征和执行多阶段跨模态推理来实现精确定位。具体来说,我们提出了一个自适应特征选择(AFS)模块,它能自动选择与查询相关的视觉特征,同时抑制背景噪音。我们设计了一个多阶段解码器(MSD),利用丰富的对象属性和引用表达中的上下文信息,迭代推断图像和查询之间的相关性,从而实现准确的目标定位。实验证明,我们的方法优于其他最先进的(SoTA)方法,准确率达到 78.24%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Deeper and Broader Multimodal Fusion: Cascaded Forest-of-Experts for Land Cover Classification Impact of Targeted Sounding Observations From FY-4B GIIRS on Two Super Typhoon Forecasts in 2024 Structural Representation-Guided GAN for Remote Sensing Image Cloud Removal A Satellite Selection Algorithm for GNSS-R InSAR Elevation Deformation Retrieval A Fast Fusion Method for Multi- and Hyperspectral Images via Subpixel-Shift Decomposition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1