Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset

IF 5.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-06-07 DOI:10.1145/3672080
Hanzhang Wang, Deming Zhai, Xiong Zhou, Junjun Jiang, Xianming Liu
{"title":"Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset","authors":"Hanzhang Wang, Deming Zhai, Xiong Zhou, Junjun Jiang, Xianming Liu","doi":"10.1145/3672080","DOIUrl":null,"url":null,"abstract":"<p>Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that <i>Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes</i>, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"11 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3672080","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Mix-DDPM:通过全局随机偏移拟合混合噪声增强扩散模型
去噪扩散概率模型(DDPM)作为一类深度生成模型,在各个领域都表现出令人印象深刻的性能。本文介绍了基于混合噪声的去噪扩散概率模型(Mix-DDPM),它将马尔可夫扩散后验视为高斯混合模型。具体来说,Mix-DDPM 随机选择一个高斯分量,然后添加所选的高斯噪声,这可以证明是将信号扰动为简单已知分布的一种更有效的方法。我们进一步将反向概率模型定义为参数化高斯混合核。由于计算高斯混合物模型之间的 KL 发散很困难,我们推导出了一个变分约束来最大化似然,为优化去噪模型提供了一个简洁的表述,并为设计采样策略提供了宝贵的见解。我们的理论推导强调,Mix-DDPM 只需移动图像,这就要求在扩散和反向过程中加入全局随机偏移,而这只需几行代码就能高效实现。全局随机偏移有效地拟合了高斯混合分布,增强了整个扩散模型的自由度。此外,我们还提出了三种精简的采样策略,这些策略可与各种快速的扩散常微分方程专用求解器对接,提高了采样阶段的图像表示效率,并缓解了生成速度慢的问题,从而提高了效率和准确性。在基准数据集上进行的大量实验证明了 Mix-DDPM 的有效性及其优于原始 DDPM 的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.50
自引率
5.90%
发文量
285
审稿时长
7.5 months
期刊介绍: The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.
期刊最新文献
TA-Detector: A GNN-based Anomaly Detector via Trust Relationship KF-VTON: Keypoints-Driven Flow Based Virtual Try-On Network Unified View Empirical Study for Large Pretrained Model on Cross-Domain Few-Shot Learning Multimodal Fusion for Talking Face Generation Utilizing Speech-related Facial Action Units Compressed Point Cloud Quality Index by Combining Global Appearance and Local Details
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1