Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or
{"title":"TurboEdit:使用几步扩散模型进行基于文本的图像编辑","authors":"Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or","doi":"arxiv-2408.00735","DOIUrl":null,"url":null,"abstract":"Diffusion models have opened the path to a wide range of text-based image\nediting frameworks. However, these typically build on the multi-step nature of\nthe diffusion backwards process, and adapting them to distilled, fast-sampling\nmethods has proven surprisingly challenging. Here, we focus on a popular line\nof text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion\napproach. We analyze its application to fast sampling methods and categorize\nits failures into two classes: the appearance of visual artifacts, and\ninsufficient editing strength. We trace the artifacts to mismatched noise\nstatistics between inverted noises and the expected noise schedule, and suggest\na shifted noise schedule which corrects for this offset. To increase editing\nstrength, we propose a pseudo-guidance approach that efficiently increases the\nmagnitude of edits without introducing new artifacts. All in all, our method\nenables text-based image editing with as few as three diffusion steps, while\nproviding novel insights into the mechanisms behind popular text-based editing\napproaches.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"81 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models\",\"authors\":\"Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or\",\"doi\":\"arxiv-2408.00735\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diffusion models have opened the path to a wide range of text-based image\\nediting frameworks. However, these typically build on the multi-step nature of\\nthe diffusion backwards process, and adapting them to distilled, fast-sampling\\nmethods has proven surprisingly challenging. Here, we focus on a popular line\\nof text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion\\napproach. We analyze its application to fast sampling methods and categorize\\nits failures into two classes: the appearance of visual artifacts, and\\ninsufficient editing strength. We trace the artifacts to mismatched noise\\nstatistics between inverted noises and the expected noise schedule, and suggest\\na shifted noise schedule which corrects for this offset. To increase editing\\nstrength, we propose a pseudo-guidance approach that efficiently increases the\\nmagnitude of edits without introducing new artifacts. All in all, our method\\nenables text-based image editing with as few as three diffusion steps, while\\nproviding novel insights into the mechanisms behind popular text-based editing\\napproaches.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"81 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.00735\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models
Diffusion models have opened the path to a wide range of text-based image
editing frameworks. However, these typically build on the multi-step nature of
the diffusion backwards process, and adapting them to distilled, fast-sampling
methods has proven surprisingly challenging. Here, we focus on a popular line
of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion
approach. We analyze its application to fast sampling methods and categorize
its failures into two classes: the appearance of visual artifacts, and
insufficient editing strength. We trace the artifacts to mismatched noise
statistics between inverted noises and the expected noise schedule, and suggest
a shifted noise schedule which corrects for this offset. To increase editing
strength, we propose a pseudo-guidance approach that efficiently increases the
magnitude of edits without introducing new artifacts. All in all, our method
enables text-based image editing with as few as three diffusion steps, while
providing novel insights into the mechanisms behind popular text-based editing
approaches.