InverseMeetInsert：通过引导扩散模型中的几何累积反演进行稳健的真实图像编辑

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI:arxiv-2409.11734

Yan Zheng, Lemeng Wu

{"title":"InverseMeetInsert：通过引导扩散模型中的几何累积反演进行稳健的真实图像编辑","authors":"Yan Zheng, Lemeng Wu","doi":"arxiv-2409.11734","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for\nGEO, an exceptionally versatile image editing technique designed to cater to\ncustomized user requirements at both local and global scales. Our approach\nseamlessly integrates text prompts and image prompts to yield diverse and\nprecise editing outcomes. Notably, our method operates without the need for\ntraining and is driven by two key contributions: (i) a novel geometric\naccumulation loss that enhances DDIM inversion to faithfully preserve pixel\nspace geometry and layout, and (ii) an innovative boosted image prompt\ntechnique that combines pixel-level editing for text-only inversion with latent\nspace geometry guidance for standard classifier-free reversion. Leveraging the\npublicly available Stable Diffusion model, our approach undergoes extensive\nevaluation across various image types and challenging prompt editing scenarios,\nconsistently delivering high-fidelity editing results for real images.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models\",\"authors\":\"Yan Zheng, Lemeng Wu\",\"doi\":\"arxiv-2409.11734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for\\nGEO, an exceptionally versatile image editing technique designed to cater to\\ncustomized user requirements at both local and global scales. Our approach\\nseamlessly integrates text prompts and image prompts to yield diverse and\\nprecise editing outcomes. Notably, our method operates without the need for\\ntraining and is driven by two key contributions: (i) a novel geometric\\naccumulation loss that enhances DDIM inversion to faithfully preserve pixel\\nspace geometry and layout, and (ii) an innovative boosted image prompt\\ntechnique that combines pixel-level editing for text-only inversion with latent\\nspace geometry guidance for standard classifier-free reversion. Leveraging the\\npublicly available Stable Diffusion model, our approach undergoes extensive\\nevaluation across various image types and challenging prompt editing scenarios,\\nconsistently delivering high-fidelity editing results for real images.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11734\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们介绍了几何反转与像素插入（Geometry-Inverse-Meet-Pixel-Insert，简称GEO），这是一种非常灵活的图像编辑技术，旨在满足用户在局部和全局范围内的个性化需求。我们的方法将文本提示和图像提示完美地结合在一起，从而产生多样化的精确编辑结果。值得注意的是，我们的方法无需训练即可运行，并由两个关键贡献驱动：(i) 一种新颖的几何累积损失，可增强 DDIM 反演以忠实保留像素空间的几何和布局；(ii) 一种创新的增强图像提示技术，可将纯文本反演的像素级编辑与标准无分类器反演的潜空间几何引导相结合。利用公开的稳定扩散模型，我们的方法在各种图像类型和具有挑战性的提示编辑场景中进行了广泛的评估，始终如一地为真实图像提供高保真编辑结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models

In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for GEO, an exceptionally versatile image editing technique designed to cater to customized user requirements at both local and global scales. Our approach seamlessly integrates text prompts and image prompts to yield diverse and precise editing outcomes. Notably, our method operates without the need for training and is driven by two key contributions: (i) a novel geometric accumulation loss that enhances DDIM inversion to faithfully preserve pixel space geometry and layout, and (ii) an innovative boosted image prompt technique that combines pixel-level editing for text-only inversion with latent space geometry guidance for standard classifier-free reversion. Leveraging the publicly available Stable Diffusion model, our approach undergoes extensive evaluation across various image types and challenging prompt editing scenarios, consistently delivering high-fidelity editing results for real images.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey