Efficient End-to-End Diffusion Model for One-Step SAR-to-Optical Translation

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society Pub Date : 2024-11-27 DOI:10.1109/LGRS.2024.3506566

Jiang Qin;Bin Zou;Haolin Li;Lamei Zhang

{"title":"Efficient End-to-End Diffusion Model for One-Step SAR-to-Optical Translation","authors":"Jiang Qin;Bin Zou;Haolin Li;Lamei Zhang","doi":"10.1109/LGRS.2024.3506566","DOIUrl":null,"url":null,"abstract":"The undesirable distortions of synthetic aperture radar (SAR) images pose a challenge to intuitive SAR interpretation. SAR-to-optical (S2O) image translation provides a feasible solution for easier interpretation of SAR and supports multisensor analysis. Currently, diffusion-based S2O models are emerging and have achieved remarkable performance in terms of perceptual metrics and fidelity. However, the numerous iterative sampling steps and slow inference speed of these diffusion models (DMs) limit their potential for practical applications. In this letter, an efficient end-to-end diffusion model (E3Diff) is developed for real-time one-step S2O translation. E3Diff not only samples as fast as generative adversarial network (GAN) models, but also retains the powerful image synthesis performance of DMs to achieve high-quality S2O translation in an end-to-end manner. To be specific, SAR spatial priors are first incorporated to provide enriched conditional clues and achieve more precise control from the feature level to synthesize optical images. Then, E3Diff is accelerated by a hybrid refinement loss, which effectively integrates the advantages of both GAN and diffusion components to achieve efficient one-step sampling. Experiments show that E3Diff achieves real-time inference speed (0.17 s per image on an A6000 GPU) and demonstrates significant image-quality improvements (35% and 27% improvement in Frechet inception distance (FID) on the UNICORN and SEN12 dataset, respectively) compared to existing state-of-the-art (SOTA) diffusion S2O methods. This advancement of E3Diff highlights its potential to enhance SAR interpretation and cross-modal applications. The code is available at \n<uri>https://github.com/DeepSARRS/E</uri>\n3Diff.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10767752/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The undesirable distortions of synthetic aperture radar (SAR) images pose a challenge to intuitive SAR interpretation. SAR-to-optical (S2O) image translation provides a feasible solution for easier interpretation of SAR and supports multisensor analysis. Currently, diffusion-based S2O models are emerging and have achieved remarkable performance in terms of perceptual metrics and fidelity. However, the numerous iterative sampling steps and slow inference speed of these diffusion models (DMs) limit their potential for practical applications. In this letter, an efficient end-to-end diffusion model (E3Diff) is developed for real-time one-step S2O translation. E3Diff not only samples as fast as generative adversarial network (GAN) models, but also retains the powerful image synthesis performance of DMs to achieve high-quality S2O translation in an end-to-end manner. To be specific, SAR spatial priors are first incorporated to provide enriched conditional clues and achieve more precise control from the feature level to synthesize optical images. Then, E3Diff is accelerated by a hybrid refinement loss, which effectively integrates the advantages of both GAN and diffusion components to achieve efficient one-step sampling. Experiments show that E3Diff achieves real-time inference speed (0.17 s per image on an A6000 GPU) and demonstrates significant image-quality improvements (35% and 27% improvement in Frechet inception distance (FID) on the UNICORN and SEN12 dataset, respectively) compared to existing state-of-the-art (SOTA) diffusion S2O methods. This advancement of E3Diff highlights its potential to enhance SAR interpretation and cross-modal applications. The code is available at https://github.com/DeepSARRS/E 3Diff.

查看原文