Yong Yang;Mengzhen Li;Shuying Huang;Weiguo Wan;Hangyuan Lu;Wei Tu
{"title":"VSDM: Variable-Scale Diffusion Model Based on Dynamic Condition Guidance for Pansharpening","authors":"Yong Yang;Mengzhen Li;Shuying Huang;Weiguo Wan;Hangyuan Lu;Wei Tu","doi":"10.1109/TGRS.2024.3504857","DOIUrl":null,"url":null,"abstract":"Pansharpening aims to obtain a high-spatial-resolution multispectral (MS) image by fusing a lower-spatial resolution MS image with a high-spatial-resolution panchromatic (PAN) image. Currently, the results obtained by most pansharpening methods still suffer from spatial and spectral distortion issues. The diffusion model has shown outstanding performance in various image-processing tasks. However, maintaining the full image size throughout the diffusion process imposes a large computational burden, and the simultaneous use of PAN and MS images acquired by different sensors as a condition for guiding noise prediction leads to spatial and spectral distortions. To solve these problems, a variable-scale diffusion model (VSDM) based on dynamic condition guidance for pansharpening is proposed, which achieves better fusion performance by improving the diffusion manner of the diffusion model and injecting dynamic conditions to guide the reverse process. In VSDM, a variable-scale diffusion manner (VSDMN) is designed to reduce the computational complexity of the model by reducing the size of the image in the diffusion process. A condition generator (CG) is constructed to generate dynamic conditions using the features learned from the PAN and upsampled MS images. In CG, a cross-attention dynamic convolution is built to extract features from the PAN image by designing a spatial and spectral attention mechanism, which can improve the spatial and spectral consistency in the dynamic condition. Extensive experiments validate the effectiveness of the proposed VSDM against other state-of-the-art (SOTA) pansharpening methods in both quantitative and qualitative assessments. The source code will be released at \n<uri>https://github.com/MELiMZ/VSDM</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-12"},"PeriodicalIF":8.6000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10764740/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Pansharpening aims to obtain a high-spatial-resolution multispectral (MS) image by fusing a lower-spatial resolution MS image with a high-spatial-resolution panchromatic (PAN) image. Currently, the results obtained by most pansharpening methods still suffer from spatial and spectral distortion issues. The diffusion model has shown outstanding performance in various image-processing tasks. However, maintaining the full image size throughout the diffusion process imposes a large computational burden, and the simultaneous use of PAN and MS images acquired by different sensors as a condition for guiding noise prediction leads to spatial and spectral distortions. To solve these problems, a variable-scale diffusion model (VSDM) based on dynamic condition guidance for pansharpening is proposed, which achieves better fusion performance by improving the diffusion manner of the diffusion model and injecting dynamic conditions to guide the reverse process. In VSDM, a variable-scale diffusion manner (VSDMN) is designed to reduce the computational complexity of the model by reducing the size of the image in the diffusion process. A condition generator (CG) is constructed to generate dynamic conditions using the features learned from the PAN and upsampled MS images. In CG, a cross-attention dynamic convolution is built to extract features from the PAN image by designing a spatial and spectral attention mechanism, which can improve the spatial and spectral consistency in the dynamic condition. Extensive experiments validate the effectiveness of the proposed VSDM against other state-of-the-art (SOTA) pansharpening methods in both quantitative and qualitative assessments. The source code will be released at
https://github.com/MELiMZ/VSDM
.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.