Hanzhang Wang, Deming Zhai, Xiong Zhou, Junjun Jiang, Xianming Liu
{"title":"Mix-DDPM:通过全局随机偏移拟合混合噪声增强扩散模型","authors":"Hanzhang Wang, Deming Zhai, Xiong Zhou, Junjun Jiang, Xianming Liu","doi":"10.1145/3672080","DOIUrl":null,"url":null,"abstract":"<p>Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that <i>Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes</i>, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"11 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset\",\"authors\":\"Hanzhang Wang, Deming Zhai, Xiong Zhou, Junjun Jiang, Xianming Liu\",\"doi\":\"10.1145/3672080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that <i>Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes</i>, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3672080\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3672080","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset
Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this paper, we introduce the Mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.