Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI:arxiv-2409.08077

Junsung Lee, Minsoo Kang, Bohyung Han

{"title":"Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation","authors":"Junsung Lee, Minsoo Kang, Bohyung Han","doi":"arxiv-2409.08077","DOIUrl":null,"url":null,"abstract":"We propose a simple but effective training-free approach tailored to\ndiffusion-based image-to-image translation. Our approach revises the original\nnoise prediction network of a pretrained diffusion model by introducing a noise\ncorrection term. We formulate the noise correction term as the difference\nbetween two noise predictions; one is computed from the denoising network with\na progressive interpolation of the source and target prompt embeddings, while\nthe other is the noise prediction with the source prompt embedding. The final\nnoise prediction network is given by a linear combination of the standard\ndenoising term and the noise correction term, where the former is designed to\nreconstruct must-be-preserved regions while the latter aims to effectively edit\nregions of interest relevant to the target prompt. Our approach can be easily\nincorporated into existing image-to-image translation methods based on\ndiffusion models. Extensive experiments verify that the proposed technique\nachieves outstanding performance with low latency and consistently improves\nexisting frameworks when combined with them.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a simple but effective training-free approach tailored to diffusion-based image-to-image translation. Our approach revises the original noise prediction network of a pretrained diffusion model by introducing a noise correction term. We formulate the noise correction term as the difference between two noise predictions; one is computed from the denoising network with a progressive interpolation of the source and target prompt embeddings, while the other is the noise prediction with the source prompt embedding. The final noise prediction network is given by a linear combination of the standard denoising term and the noise correction term, where the former is designed to reconstruct must-be-preserved regions while the latter aims to effectively edit regions of interest relevant to the target prompt. Our approach can be easily incorporated into existing image-to-image translation methods based on diffusion models. Extensive experiments verify that the proposed technique achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过即时插值进行噪声校正，实现基于扩散的图像间平移

我们针对基于扩散的图像到图像转换提出了一种简单而有效的免训练方法。我们的方法通过引入噪声校正项，修改了预训练扩散模型的原始噪声预测网络。我们将噪声校正项表述为两个噪声预测之间的差值；一个是通过对源和目标提示嵌入进行渐进插值的去噪网络计算得出的，另一个是通过源提示嵌入得出的噪声预测。最终的噪声预测网络由标准去噪项和噪声校正项的线性组合构成，前者旨在重建必须保留的区域，后者旨在有效编辑与目标提示相关的感兴趣区域。我们的方法可以轻松融入现有的基于扩散模型的图像到图像翻译方法中。广泛的实验验证了所提出的技术能以较低的延迟实现出色的性能，并在与现有框架相结合时持续改进现有框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey