Jiang Qin;Kai Wang;Bin Zou;Lamei Zhang;Joost van de Weijer
{"title":"用于合成孔径雷达到光学图像转换的空间频率细化条件扩散模型","authors":"Jiang Qin;Kai Wang;Bin Zou;Lamei Zhang;Joost van de Weijer","doi":"10.1109/TGRS.2024.3491826","DOIUrl":null,"url":null,"abstract":"The presence of speckles and geometric distortions poses a serious challenge to the visual interpretation of synthetic aperture radar (SAR) images. SAR-to-optical (S2O) image translation technology provides a feasible solution and has attracted increasing attention. Restricted by substantial gaps between optical and SAR images, current S2O translation methods unavoidably result in geometric distortions, target missing, and generating low-fidelity images, thereby limiting subsequent cross-modal applications. In this article, we propose an augmented conditional denoising diffusion probabilistic model with spatial-frequency refinement (SFDiff) for high-fidelity S2O image translation. SFDiff progressively narrows the gap between synthesized and real images in both spatial and frequency perspectives, showcasing notable performance in terms of quality and consistency. Specifically, to incorporate rich spatial content priors provided by SAR images, we design an SAR context prior extractor (SCPE) with denoising enhancement to extract multiscale conditional representations, thereby aiding SFDiff in capturing more descriptive cues for S2O translation. In addition, a spatial-frequency complementary learning (SFCL) module is designed to learn spatial semantics and simultaneously enhances informative frequency components and global dependencies. Furthermore, SFDiff is optimized using the joint spatial-frequency refinement loss, facilitating iterative refinement in both spatial and frequency domains to enhance content consistency and fidelity in the synthesized images. Based on the experimental findings from the UNICORN dataset and the SEN12 dataset, SFDiff maintains a high level of content and structural consistency, resulting in visually appealing translation results that surpass the state-of-the-art (SOTA) methods. In particular, SFDiff exhibits excellent performance in preserving small targets and details, which is crucial in cross-modal detection applications.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-14"},"PeriodicalIF":8.6000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Conditional Diffusion Model With Spatial-Frequency Refinement for SAR-to-Optical Image Translation\",\"authors\":\"Jiang Qin;Kai Wang;Bin Zou;Lamei Zhang;Joost van de Weijer\",\"doi\":\"10.1109/TGRS.2024.3491826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The presence of speckles and geometric distortions poses a serious challenge to the visual interpretation of synthetic aperture radar (SAR) images. SAR-to-optical (S2O) image translation technology provides a feasible solution and has attracted increasing attention. Restricted by substantial gaps between optical and SAR images, current S2O translation methods unavoidably result in geometric distortions, target missing, and generating low-fidelity images, thereby limiting subsequent cross-modal applications. In this article, we propose an augmented conditional denoising diffusion probabilistic model with spatial-frequency refinement (SFDiff) for high-fidelity S2O image translation. SFDiff progressively narrows the gap between synthesized and real images in both spatial and frequency perspectives, showcasing notable performance in terms of quality and consistency. Specifically, to incorporate rich spatial content priors provided by SAR images, we design an SAR context prior extractor (SCPE) with denoising enhancement to extract multiscale conditional representations, thereby aiding SFDiff in capturing more descriptive cues for S2O translation. In addition, a spatial-frequency complementary learning (SFCL) module is designed to learn spatial semantics and simultaneously enhances informative frequency components and global dependencies. Furthermore, SFDiff is optimized using the joint spatial-frequency refinement loss, facilitating iterative refinement in both spatial and frequency domains to enhance content consistency and fidelity in the synthesized images. Based on the experimental findings from the UNICORN dataset and the SEN12 dataset, SFDiff maintains a high level of content and structural consistency, resulting in visually appealing translation results that surpass the state-of-the-art (SOTA) methods. In particular, SFDiff exhibits excellent performance in preserving small targets and details, which is crucial in cross-modal detection applications.\",\"PeriodicalId\":13213,\"journal\":{\"name\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"volume\":\"62 \",\"pages\":\"1-14\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2024-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10744588/\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10744588/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Conditional Diffusion Model With Spatial-Frequency Refinement for SAR-to-Optical Image Translation
The presence of speckles and geometric distortions poses a serious challenge to the visual interpretation of synthetic aperture radar (SAR) images. SAR-to-optical (S2O) image translation technology provides a feasible solution and has attracted increasing attention. Restricted by substantial gaps between optical and SAR images, current S2O translation methods unavoidably result in geometric distortions, target missing, and generating low-fidelity images, thereby limiting subsequent cross-modal applications. In this article, we propose an augmented conditional denoising diffusion probabilistic model with spatial-frequency refinement (SFDiff) for high-fidelity S2O image translation. SFDiff progressively narrows the gap between synthesized and real images in both spatial and frequency perspectives, showcasing notable performance in terms of quality and consistency. Specifically, to incorporate rich spatial content priors provided by SAR images, we design an SAR context prior extractor (SCPE) with denoising enhancement to extract multiscale conditional representations, thereby aiding SFDiff in capturing more descriptive cues for S2O translation. In addition, a spatial-frequency complementary learning (SFCL) module is designed to learn spatial semantics and simultaneously enhances informative frequency components and global dependencies. Furthermore, SFDiff is optimized using the joint spatial-frequency refinement loss, facilitating iterative refinement in both spatial and frequency domains to enhance content consistency and fidelity in the synthesized images. Based on the experimental findings from the UNICORN dataset and the SEN12 dataset, SFDiff maintains a high level of content and structural consistency, resulting in visually appealing translation results that surpass the state-of-the-art (SOTA) methods. In particular, SFDiff exhibits excellent performance in preserving small targets and details, which is crucial in cross-modal detection applications.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.