Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image–Text Retrieval

IF 7.5 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2024-11-19 DOI:10.1109/TGRS.2024.3496898
Rui Yang;Shuang Wang;Yingping Han;Yuanheng Li;Dong Zhao;Dou Quan;Yanhe Guo;Licheng Jiao;Zhi Yang
{"title":"Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image–Text Retrieval","authors":"Rui Yang;Shuang Wang;Yingping Han;Yuanheng Li;Dong Zhao;Dou Quan;Yanhe Guo;Licheng Jiao;Zhi Yang","doi":"10.1109/TGRS.2024.3496898","DOIUrl":null,"url":null,"abstract":"Remote sensing image-text retrieval (RSITR) is pivotal for knowledge services and data mining in the remote sensing (RS) domain. Considering the multiscale representations in image content and text vocabulary can enable the models to learn richer representations and enhance retrieval. Current multiscale RSITR approaches typically align multiscale fused image features with text features but overlook aligning image-text pairs at distinct scales separately. This oversight restricts their ability to learn joint representations suitable for effective retrieval. We introduce a novel multiscale alignment (MSA) method to overcome this limitation. Our method comprises three key innovations: 1) a multiscale cross-modal alignment transformer (MSCMAT), which computes cross-attention between single-scale image features and localized text features, integrating global textual context to derive a matching score matrix within a mini-batch; 2) a multiscale cross-modal semantic alignment loss (MSCMA loss) that enforces semantic alignment across scales; and 3) a cross-scale multimodal semantic consistency loss (CSMMC loss) that uses the matching matrix from the largest scale to guide alignment at smaller scales. We evaluated our method across multiple datasets, demonstrating its efficacy with various visual backbones and establishing its superiority over existing state-of-the-art methods. The GitHub URL for our project is \n<uri>https://github.com/yr666666/MSA</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-17"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10758255/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Remote sensing image-text retrieval (RSITR) is pivotal for knowledge services and data mining in the remote sensing (RS) domain. Considering the multiscale representations in image content and text vocabulary can enable the models to learn richer representations and enhance retrieval. Current multiscale RSITR approaches typically align multiscale fused image features with text features but overlook aligning image-text pairs at distinct scales separately. This oversight restricts their ability to learn joint representations suitable for effective retrieval. We introduce a novel multiscale alignment (MSA) method to overcome this limitation. Our method comprises three key innovations: 1) a multiscale cross-modal alignment transformer (MSCMAT), which computes cross-attention between single-scale image features and localized text features, integrating global textual context to derive a matching score matrix within a mini-batch; 2) a multiscale cross-modal semantic alignment loss (MSCMA loss) that enforces semantic alignment across scales; and 3) a cross-scale multimodal semantic consistency loss (CSMMC loss) that uses the matching matrix from the largest scale to guide alignment at smaller scales. We evaluated our method across multiple datasets, demonstrating its efficacy with various visual backbones and establishing its superiority over existing state-of-the-art methods. The GitHub URL for our project is https://github.com/yr666666/MSA .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
超越融合:遥感图像-文本检索的多尺度对齐方法
遥感图像-文本检索(RSITR)对于遥感(RS)领域的知识服务和数据挖掘至关重要。考虑图像内容和文本词汇中的多尺度表征可以使模型学习到更丰富的表征并提高检索效率。目前的多尺度 RSITR 方法通常将多尺度融合图像特征与文本特征进行对齐,但忽略了将不同尺度的图像-文本对分别进行对齐。这种疏忽限制了它们学习适合有效检索的联合表征的能力。我们引入了一种新颖的多尺度对齐(MSA)方法来克服这一局限。我们的方法包括三项关键创新:1)多尺度跨模态配准转换器(MSCMAT),计算单尺度图像特征和局部文本特征之间的交叉注意,整合全局文本上下文,在一个小批量中得出匹配得分矩阵;2)多尺度跨模态语义配准损失(MSCMA loss),强制跨尺度语义配准;3)跨尺度多模态语义一致性损失(CSMMC loss),使用最大尺度的匹配矩阵指导较小尺度的配准。我们在多个数据集上评估了我们的方法,证明了它在各种视觉骨干上的有效性,并确立了它优于现有先进方法的地位。我们项目的 GitHub 网址是 https://github.com/yr666666/MSA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
MST-Net: A General Deep Learning Model for Thick Cloud Removal from Optical Images Spatial-Frequency Residual-guided Dynamic Perceptual Network for Remote Sensing Image Haze Removal Joint Classification of Hyperspectral Images and LiDAR Data Based on Candidate Pseudo Labels Pruning and Dual Mixture of Experts Scene Adaptive SAR Incremental Target Detection via Context-Aware Attention and Gaussian-Box Similarity Metric Multi-dimensional Remote Sensing Change Detection Based on Siamese Dual-Branch Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1