{"title":"Sparse-Guided Partial Dense for Cross-Modal Remote Sensing Image–Text Retrieval","authors":"Zuopeng Zhao;Xiaoran Miao;Lei Liu;Xinzheng Xu;Ying Liu;Jianfeng Hu;Bingbing Min;Yumeng Gao;Kanyaphakphachsorn Pharksuwan","doi":"10.1109/TGRS.2025.3555956","DOIUrl":null,"url":null,"abstract":"Cross-modal remote sensing image-text retrieval (CMRSITR) involves retrieving relevant samples in one modality based on a query from another modality. Previous dense retrieval methods utilizing multivector dense representations have significantly enhanced retrieval performance. Meanwhile, recent advances in sparse retrieval have demonstrated that sparse representations offer comparable performance with enhanced interpretability and faster retrieval speeds. However, effectively integrating the strengths of these two paradigms to enable efficient and accurate retrieval in large-scale remote sensing (RS) image-text datasets remains an open challenge. In this study, we propose sparse-guided partial dense (SGPD) cross-modal retrieval, a novel approach that efficiently transforms dense vectors from pretrained dense retrieval models into sparse representations and leverages the overlap between sparse retrieval results and dense vector clusters to achieve high-precision and fast retrieval. By probabilistically selecting a limited number of dense clusters containing top sparse results, SGPD ensures retrieval efficiency while minimizing both memory and time costs. Designed as a plug-and-play solution, SGPD can be seamlessly integrated into existing RS image-text retrieval (RSITR) models without requiring modifications to their architectures. Extensive experiments on RS image-text datasets of varying scales demonstrate that SGPD achieves retrieval accuracy comparable to dense retrieval methods while significantly reducing training time and memory consumption.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-13"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10945442/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-modal remote sensing image-text retrieval (CMRSITR) involves retrieving relevant samples in one modality based on a query from another modality. Previous dense retrieval methods utilizing multivector dense representations have significantly enhanced retrieval performance. Meanwhile, recent advances in sparse retrieval have demonstrated that sparse representations offer comparable performance with enhanced interpretability and faster retrieval speeds. However, effectively integrating the strengths of these two paradigms to enable efficient and accurate retrieval in large-scale remote sensing (RS) image-text datasets remains an open challenge. In this study, we propose sparse-guided partial dense (SGPD) cross-modal retrieval, a novel approach that efficiently transforms dense vectors from pretrained dense retrieval models into sparse representations and leverages the overlap between sparse retrieval results and dense vector clusters to achieve high-precision and fast retrieval. By probabilistically selecting a limited number of dense clusters containing top sparse results, SGPD ensures retrieval efficiency while minimizing both memory and time costs. Designed as a plug-and-play solution, SGPD can be seamlessly integrated into existing RS image-text retrieval (RSITR) models without requiring modifications to their architectures. Extensive experiments on RS image-text datasets of varying scales demonstrate that SGPD achieves retrieval accuracy comparable to dense retrieval methods while significantly reducing training time and memory consumption.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.