Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset

IF 5.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-06-07 DOI:10.1145/3672080

Hanzhang Wang, Deming Zhai, Xiong Zhou, Junjun Jiang, Xianming Liu

{"title":"Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset","authors":"Hanzhang Wang, Deming Zhai, Xiong Zhou, Junjun Jiang, Xianming Liu","doi":"10.1145/3672080","DOIUrl":null,"url":null,"abstract":"<p>Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that <i>Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes</i>, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"11 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3672080","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mix-DDPM：通过全局随机偏移拟合混合噪声增强扩散模型

去噪扩散概率模型（DDPM）作为一类深度生成模型，在各个领域都表现出令人印象深刻的性能。本文介绍了基于混合噪声的去噪扩散概率模型（Mix-DDPM），它将马尔可夫扩散后验视为高斯混合模型。具体来说，Mix-DDPM 随机选择一个高斯分量，然后添加所选的高斯噪声，这可以证明是将信号扰动为简单已知分布的一种更有效的方法。我们进一步将反向概率模型定义为参数化高斯混合核。由于计算高斯混合物模型之间的 KL 发散很困难，我们推导出了一个变分约束来最大化似然，为优化去噪模型提供了一个简洁的表述，并为设计采样策略提供了宝贵的见解。我们的理论推导强调，Mix-DDPM 只需移动图像，这就要求在扩散和反向过程中加入全局随机偏移，而这只需几行代码就能高效实现。全局随机偏移有效地拟合了高斯混合分布，增强了整个扩散模型的自由度。此外，我们还提出了三种精简的采样策略，这些策略可与各种快速的扩散常微分方程专用求解器对接，提高了采样阶段的图像表示效率，并缓解了生成速度慢的问题，从而提高了效率和准确性。在基准数据集上进行的大量实验证明了 Mix-DDPM 的有效性及其优于原始 DDPM 的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.