{"title":"MMCANet A Multimodal and Cross-Attention Network for Cloud Removal and Exploration of Progressive Remote Sensing Images Restoration Algorithm","authors":"Yejian Zhou;Jiahui Suo;Yachen Wang;Jie Su;Wen Xiao;Zhen Hong;Rajiv Ranjan;Lizhe Wang;Zhenyu Wen","doi":"10.1109/TGRS.2025.3556560","DOIUrl":null,"url":null,"abstract":"In Earth observation, cloud severely affects the interpretation of optical satellites generated high-resolution images. Cloud-free optical images are vital for downstream tasks such as semantic segmentation and object detection. Thus, the elimination of clouds from optical imagery has emerged as a significant topic in remote sensing. Currently, most existing methods are proposed to leverage the texture information from auxiliary synthetic aperture radar (SAR) images to restore cloud-free images via direct channel merging. However, such a unified feature extraction approach often neglects the inherent distribution disparity between SAR and optical images—the result of differing imaging principles-potentially leading to significant feature loss. To this end, we introduce a network by jointing SAR and optical images multimodal and cross-attention network (MMCANet) to effectively extract multiscale contextual features from SAR imagery and integrate them with optical features. Specifically, instead of simple concatenation of the channels of SAR and optical images, we obtain high-dimensional features from them through independent feature extractors. The integration of these features is facilitated by a cross-attention mechanism that provides a more fine-grained amalgamation of information. Meanwhile, an atrous spatial pyramid pooling (ASPP) module is introduced into the integration of high-level features, which captures multiscale contextual information around clouded areas. In addition, we propose four advanced remote sensing image restoration algorithms that approach image restoration as a series of subtasks, gradually eliminating clouds to enhance performance. Comprehensive assessments show that MMCANet performs well on the SEN 12 MS-CR dataset with peak signal-to-noise ratio (PSNR) of 39.8871, structural similarity index (SSIM) of 0.9672, mean absolute error (MAE) of 0.0081, and spectral angle mapper (SAM) of 2.9884.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-13"},"PeriodicalIF":8.6000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10946262/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In Earth observation, cloud severely affects the interpretation of optical satellites generated high-resolution images. Cloud-free optical images are vital for downstream tasks such as semantic segmentation and object detection. Thus, the elimination of clouds from optical imagery has emerged as a significant topic in remote sensing. Currently, most existing methods are proposed to leverage the texture information from auxiliary synthetic aperture radar (SAR) images to restore cloud-free images via direct channel merging. However, such a unified feature extraction approach often neglects the inherent distribution disparity between SAR and optical images—the result of differing imaging principles-potentially leading to significant feature loss. To this end, we introduce a network by jointing SAR and optical images multimodal and cross-attention network (MMCANet) to effectively extract multiscale contextual features from SAR imagery and integrate them with optical features. Specifically, instead of simple concatenation of the channels of SAR and optical images, we obtain high-dimensional features from them through independent feature extractors. The integration of these features is facilitated by a cross-attention mechanism that provides a more fine-grained amalgamation of information. Meanwhile, an atrous spatial pyramid pooling (ASPP) module is introduced into the integration of high-level features, which captures multiscale contextual information around clouded areas. In addition, we propose four advanced remote sensing image restoration algorithms that approach image restoration as a series of subtasks, gradually eliminating clouds to enhance performance. Comprehensive assessments show that MMCANet performs well on the SEN 12 MS-CR dataset with peak signal-to-noise ratio (PSNR) of 39.8871, structural similarity index (SSIM) of 0.9672, mean absolute error (MAE) of 0.0081, and spectral angle mapper (SAM) of 2.9884.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.