Scribble-Guided Diffusion for Training-free Text-to-Image Generation

Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim
{"title":"Scribble-Guided Diffusion for Training-free Text-to-Image Generation","authors":"Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim","doi":"arxiv-2409.08026","DOIUrl":null,"url":null,"abstract":"Recent advancements in text-to-image diffusion models have demonstrated\nremarkable success, yet they often struggle to fully capture the user's intent.\nExisting approaches using textual inputs combined with bounding boxes or region\nmasks fall short in providing precise spatial guidance, often leading to\nmisaligned or unintended object orientation. To address these limitations, we\npropose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that\nutilizes simple user-provided scribbles as visual prompts to guide image\ngeneration. However, incorporating scribbles into diffusion models presents\nchallenges due to their sparse and thin nature, making it difficult to ensure\naccurate orientation alignment. To overcome these challenges, we introduce\nmoment alignment and scribble propagation, which allow for more effective and\nflexible alignment between generated images and scribble inputs. Experimental\nresults on the PASCAL-Scribble dataset demonstrate significant improvements in\nspatial control and consistency, showcasing the effectiveness of scribble-based\nguidance in diffusion models. Our code is available at\nhttps://github.com/kaist-cvml-lab/scribble-diffusion.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent. Existing approaches using textual inputs combined with bounding boxes or region masks fall short in providing precise spatial guidance, often leading to misaligned or unintended object orientation. To address these limitations, we propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that utilizes simple user-provided scribbles as visual prompts to guide image generation. However, incorporating scribbles into diffusion models presents challenges due to their sparse and thin nature, making it difficult to ensure accurate orientation alignment. To overcome these challenges, we introduce moment alignment and scribble propagation, which allow for more effective and flexible alignment between generated images and scribble inputs. Experimental results on the PASCAL-Scribble dataset demonstrate significant improvements in spatial control and consistency, showcasing the effectiveness of scribble-based guidance in diffusion models. Our code is available at https://github.com/kaist-cvml-lab/scribble-diffusion.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于免训练文本到图像生成的涂鸦引导扩散技术
文本到图像扩散模型的最新进展已经取得了显著的成功,但它们往往难以完全捕捉用户的意图。现有的方法使用文本输入与边界框或区域掩码相结合,在提供精确的空间引导方面存在不足,经常会导致对象定位不准或意外定位。为了解决这些局限性,我们提出了涂鸦引导扩散(ScribbleDiff),这是一种无需训练的方法,利用用户提供的简单涂鸦作为视觉提示来引导图像生成。然而,将涂鸦纳入扩散模型会面临挑战,因为涂鸦稀疏且薄,很难确保方向对齐的准确性。为了克服这些挑战,我们引入了瞬间配准和涂鸦传播,从而在生成的图像和涂鸦输入之间实现更有效、更灵活的配准。在PASCAL-Scribble数据集上的实验结果表明,空间控制和一致性有了显著改善,展示了基于scribble的导航在扩散模型中的有效性。我们的代码可在https://github.com/kaist-cvml-lab/scribble-diffusion。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1