{"title":"利用基于结构化知识提炼的多模态去噪扩散概率模型提高图像超分辨率","authors":"Li Huang, JingKe Yan, Min Wang, Qin Wang","doi":"10.1117/1.jei.33.3.033004","DOIUrl":null,"url":null,"abstract":"In the realm of low-resolution (LR) to high-resolution (HR) image reconstruction, denoising diffusion probabilistic models (DDPMs) are recognized for their superior perceptual quality over other generative models, attributed to their adept handling of various degradation factors in LR images, such as noise and blur. However, DDPMs predominantly focus on a single modality in the super-resolution (SR) image reconstruction from LR images, thus overlooking the rich potential information in multimodal data. This lack of integration and comprehensive processing of multimodal data can impede the full utilization of the complementary characteristics of different data types, limiting their effectiveness across a broad range of applications. Moreover, DDPMs require thousands of evaluations to reconstruct high-quality SR images, which significantly impacts their efficiency. In response to these challenges, a novel multimodal DDPM based on structured knowledge distillation (MKDDPM) is introduced. This approach features a multimodal-based DDPM that effectively leverages sparse prior information from another modality, integrated into the MKDDPM network architecture to optimize the solution space and detail features of the reconstructed image. Furthermore, a structured knowledge distillation method is proposed, leveraging a well-trained DDPM and iteratively learning a new DDPM, with each iteration requiring only half the original sampling steps. This method significantly reduces the number of model sampling steps without compromising on sampling quality. Experimental results demonstrate that MKDDPM, even with a substantially reduced number of diffusion steps, still achieves superior performance, providing a novel solution for single-image SR tasks.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"68 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving image super-resolution with structured knowledge distillation-based multimodal denoising diffusion probabilistic model\",\"authors\":\"Li Huang, JingKe Yan, Min Wang, Qin Wang\",\"doi\":\"10.1117/1.jei.33.3.033004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the realm of low-resolution (LR) to high-resolution (HR) image reconstruction, denoising diffusion probabilistic models (DDPMs) are recognized for their superior perceptual quality over other generative models, attributed to their adept handling of various degradation factors in LR images, such as noise and blur. However, DDPMs predominantly focus on a single modality in the super-resolution (SR) image reconstruction from LR images, thus overlooking the rich potential information in multimodal data. This lack of integration and comprehensive processing of multimodal data can impede the full utilization of the complementary characteristics of different data types, limiting their effectiveness across a broad range of applications. Moreover, DDPMs require thousands of evaluations to reconstruct high-quality SR images, which significantly impacts their efficiency. In response to these challenges, a novel multimodal DDPM based on structured knowledge distillation (MKDDPM) is introduced. This approach features a multimodal-based DDPM that effectively leverages sparse prior information from another modality, integrated into the MKDDPM network architecture to optimize the solution space and detail features of the reconstructed image. Furthermore, a structured knowledge distillation method is proposed, leveraging a well-trained DDPM and iteratively learning a new DDPM, with each iteration requiring only half the original sampling steps. This method significantly reduces the number of model sampling steps without compromising on sampling quality. Experimental results demonstrate that MKDDPM, even with a substantially reduced number of diffusion steps, still achieves superior performance, providing a novel solution for single-image SR tasks.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"68 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.3.033004\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.3.033004","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
在从低分辨率(LR)到高分辨率(HR)的图像重建领域,去噪扩散概率模型(DDPMs)因其比其他生成模型更优越的感知质量而得到认可,这归功于它们对 LR 图像中各种退化因素(如噪声和模糊)的出色处理。然而,DDPM 在从 LR 图像重建超分辨率(SR)图像时主要关注单一模态,从而忽略了多模态数据中丰富的潜在信息。缺乏对多模态数据的整合和综合处理,会妨碍充分利用不同数据类型的互补特性,限制其在广泛应用中的有效性。此外,DDPM 需要经过数千次评估才能重建高质量的 SR 图像,这极大地影响了其效率。为了应对这些挑战,我们引入了一种基于结构化知识提炼的新型多模态 DDPM(MKDDPM)。这种方法的特点是基于多模态的 DDPM 能有效利用来自另一种模态的稀疏先验信息,并将其集成到 MKDDPM 网络架构中,以优化解空间和重建图像的细节特征。此外,还提出了一种结构化知识提炼方法,利用训练有素的 DDPM,迭代学习新的 DDPM,每次迭代只需原来一半的采样步骤。这种方法在不影响采样质量的前提下,大大减少了模型采样步骤的数量。实验结果表明,MKDDPM 即使大幅减少了扩散步骤的数量,仍能实现卓越的性能,为单图像 SR 任务提供了一种新颖的解决方案。
Improving image super-resolution with structured knowledge distillation-based multimodal denoising diffusion probabilistic model
In the realm of low-resolution (LR) to high-resolution (HR) image reconstruction, denoising diffusion probabilistic models (DDPMs) are recognized for their superior perceptual quality over other generative models, attributed to their adept handling of various degradation factors in LR images, such as noise and blur. However, DDPMs predominantly focus on a single modality in the super-resolution (SR) image reconstruction from LR images, thus overlooking the rich potential information in multimodal data. This lack of integration and comprehensive processing of multimodal data can impede the full utilization of the complementary characteristics of different data types, limiting their effectiveness across a broad range of applications. Moreover, DDPMs require thousands of evaluations to reconstruct high-quality SR images, which significantly impacts their efficiency. In response to these challenges, a novel multimodal DDPM based on structured knowledge distillation (MKDDPM) is introduced. This approach features a multimodal-based DDPM that effectively leverages sparse prior information from another modality, integrated into the MKDDPM network architecture to optimize the solution space and detail features of the reconstructed image. Furthermore, a structured knowledge distillation method is proposed, leveraging a well-trained DDPM and iteratively learning a new DDPM, with each iteration requiring only half the original sampling steps. This method significantly reduces the number of model sampling steps without compromising on sampling quality. Experimental results demonstrate that MKDDPM, even with a substantially reduced number of diffusion steps, still achieves superior performance, providing a novel solution for single-image SR tasks.
期刊介绍:
The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.