Context-Aware Local–Global Semantic Alignment for Remote Sensing Image–Text Retrieval

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-03-17 DOI:10.1109/TGRS.2025.3552304
Xiumei Chen;Xiangtao Zheng;Xiaoqiang Lu
{"title":"Context-Aware Local–Global Semantic Alignment for Remote Sensing Image–Text Retrieval","authors":"Xiumei Chen;Xiangtao Zheng;Xiaoqiang Lu","doi":"10.1109/TGRS.2025.3552304","DOIUrl":null,"url":null,"abstract":"Remote sensing image-text retrieval (RSITR) is a cross-modal task that integrates visual and textual information, attracting significant attention in remote sensing research. Remote sensing images typically contain complex scenes with abundant details, presenting significant challenges for accurate semantic alignment between images and texts. Despite advances in the field, achieving precise alignment in such intricate contexts remains a major hurdle. To address this challenge, this article introduces a novel context-aware local-global semantic alignment (CLGSA) method. The proposed method consists of two key modules: the local key feature alignment (LKFA) module and the cross-sample global semantic alignment (CGSA) module. The LKFA module incorporates a local image masking and reconstruction task to improve the alignment between image and text features. Specifically, this module masks certain regions of the image and uses text context information to guide the reconstruction of the masked areas, enhancing the alignment of local semantics and ensuring more accurate retrieval of region-specific content. The CGSA module employs a hard sample triplet loss to improve global semantic consistency. By prioritizing difficult samples during training, this module refines feature space distributions, helping the model better capture global semantics across the entire image-text pair. A series of extensive experiments demonstrates the effectiveness of the proposed method. The method achieves an mR score of 32.07% on the RSICD dataset and 46.63% on the RSITMD dataset, outperforming baseline methods and confirming the robustness and accuracy of the approach.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-12"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10930665/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Remote sensing image-text retrieval (RSITR) is a cross-modal task that integrates visual and textual information, attracting significant attention in remote sensing research. Remote sensing images typically contain complex scenes with abundant details, presenting significant challenges for accurate semantic alignment between images and texts. Despite advances in the field, achieving precise alignment in such intricate contexts remains a major hurdle. To address this challenge, this article introduces a novel context-aware local-global semantic alignment (CLGSA) method. The proposed method consists of two key modules: the local key feature alignment (LKFA) module and the cross-sample global semantic alignment (CGSA) module. The LKFA module incorporates a local image masking and reconstruction task to improve the alignment between image and text features. Specifically, this module masks certain regions of the image and uses text context information to guide the reconstruction of the masked areas, enhancing the alignment of local semantics and ensuring more accurate retrieval of region-specific content. The CGSA module employs a hard sample triplet loss to improve global semantic consistency. By prioritizing difficult samples during training, this module refines feature space distributions, helping the model better capture global semantics across the entire image-text pair. A series of extensive experiments demonstrates the effectiveness of the proposed method. The method achieves an mR score of 32.07% on the RSICD dataset and 46.63% on the RSITMD dataset, outperforming baseline methods and confirming the robustness and accuracy of the approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
上下文感知的局部-全局语义对齐遥感图像-文本检索
遥感图像文本检索(RSITR)是一项集视觉和文本信息于一体的跨模态任务,是遥感研究的热点。遥感图像通常包含复杂的场景和丰富的细节,这对图像和文本之间的准确语义对齐提出了重大挑战。尽管该领域取得了进步,但在如此复杂的环境中实现精确对准仍然是一个主要障碍。为了解决这一挑战,本文引入了一种新的上下文感知的局部全局语义对齐(CLGSA)方法。该方法包括两个关键模块:局部关键特征对齐(LKFA)模块和跨样本全局语义对齐(CGSA)模块。LKFA模块包含一个局部图像掩蔽和重建任务,以改善图像和文本特征之间的对齐。具体而言,该模块对图像的某些区域进行屏蔽,并使用文本上下文信息来指导被屏蔽区域的重建,增强了局部语义的对齐,确保更准确地检索到特定区域的内容。CGSA模块采用硬样本三重丢失来提高全局语义一致性。通过在训练过程中对困难样本进行优先级排序,该模块细化了特征空间分布,帮助模型更好地捕获整个图像-文本对的全局语义。一系列广泛的实验证明了该方法的有效性。该方法在RSICD数据集和RSITMD数据集上的mR得分分别为32.07%和46.63%,优于基准方法,验证了该方法的鲁棒性和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
HAM-CD: Hybrid Attention Mamba for Remote Sensing Change Detection Land Surface Temperature End-to-end Retrieval from Fengyun-4B AGRI Thermal Infrared Remote Sensing Data Considering the Emissivity Angular Effect PointCleaner: Dynamic Manifold Path Optimization for LiDAR Point Cloud Denoising See Hidden Insight From Transposition: Multi-Axis Feature Aggregation for Aerial Object Detection Harmonic-Assisted TDOA Localization Method for Saturation Interference Sources in SAR Satellite Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1