{"title":"RAMSF: A Novel Generic Framework for Optical Remote Sensing Multimodal Spatial-Spectral Fusion","authors":"Chuang Liu;Zhiqi Zhang;Mi Wang","doi":"10.1109/TGRS.2025.3552937","DOIUrl":null,"url":null,"abstract":"Optical remote sensing (ORS) multimodal spatial-spectral fusion (MSF) aims to obtain high-resolution images containing fine-grained spatial details and high-fidelity spectral information, which are crucial for downstream tasks and real-world applications. Existing methods can yield promising outcomes in specific fusion scenarios. However, due to the coarse representation of spatial details and the imprecise alignment of spatial-spectral features, the majority of methods encounter difficulties in balancing spatial and spectral preservation. This imbalance tends to cause distortion in the fused image, rendering these task-specific methods less adaptable and more challenging to apply simultaneously to different ORS-MSF tasks. To address this gap, this article introduces a generic framework that focuses on generalization and practical applicability, rather than solely optimizing the performance of models in a specific fusion task. By conducting a comprehensive analysis of theoretical models and network architectures, we systematically decompose the fusion process into two distinct phases, namely, detail reconstruction and feature alignment. Consequently, the proposed framework consists of two fundamental components: low-frequency-driven high-frequency salient detail reconstruction (LHSDR) and coordinate-modal-guided spatial-spectral feature progressive alignment (CSFPA). In LHSDR, the joint spatial degradation process in various frequency directions from diverse modal data is estimated and salient details are derived in a hierarchical integration, with low frequencies driving high ones. These coupled high-frequency details could lay the foundation for subsequent implementation of high-fidelity fusion. Furthermore, CSFPA estimates the joint spectral degradation process by establishing coordinate-mode relations between coupled high-frequency details and corresponding spectral information in the continuous domain. As a result, high spatial-spectral fidelity fused images are obtained through fine detail reconstruction and accurate feature alignment. Ten datasets derived from three different ORS-MSF tasks are utilized for an experiment, comprising eight simulated and five real test sets. Our proposed methodology demonstrates robust fusion performance and generalization capability on data with different spectral bands at various resolutions. All implementations will be published on our website.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-22"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10934049/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Optical remote sensing (ORS) multimodal spatial-spectral fusion (MSF) aims to obtain high-resolution images containing fine-grained spatial details and high-fidelity spectral information, which are crucial for downstream tasks and real-world applications. Existing methods can yield promising outcomes in specific fusion scenarios. However, due to the coarse representation of spatial details and the imprecise alignment of spatial-spectral features, the majority of methods encounter difficulties in balancing spatial and spectral preservation. This imbalance tends to cause distortion in the fused image, rendering these task-specific methods less adaptable and more challenging to apply simultaneously to different ORS-MSF tasks. To address this gap, this article introduces a generic framework that focuses on generalization and practical applicability, rather than solely optimizing the performance of models in a specific fusion task. By conducting a comprehensive analysis of theoretical models and network architectures, we systematically decompose the fusion process into two distinct phases, namely, detail reconstruction and feature alignment. Consequently, the proposed framework consists of two fundamental components: low-frequency-driven high-frequency salient detail reconstruction (LHSDR) and coordinate-modal-guided spatial-spectral feature progressive alignment (CSFPA). In LHSDR, the joint spatial degradation process in various frequency directions from diverse modal data is estimated and salient details are derived in a hierarchical integration, with low frequencies driving high ones. These coupled high-frequency details could lay the foundation for subsequent implementation of high-fidelity fusion. Furthermore, CSFPA estimates the joint spectral degradation process by establishing coordinate-mode relations between coupled high-frequency details and corresponding spectral information in the continuous domain. As a result, high spatial-spectral fidelity fused images are obtained through fine detail reconstruction and accurate feature alignment. Ten datasets derived from three different ORS-MSF tasks are utilized for an experiment, comprising eight simulated and five real test sets. Our proposed methodology demonstrates robust fusion performance and generalization capability on data with different spectral bands at various resolutions. All implementations will be published on our website.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.