{"title":"Click2Mask: Local Editing with Dynamic Mask Generation","authors":"Omer Regev, Omri Avrahami, Dani Lischinski","doi":"arxiv-2409.08272","DOIUrl":null,"url":null,"abstract":"Recent advancements in generative models have revolutionized image generation\nand editing, making these tasks accessible to non-experts. This paper focuses\non local image editing, particularly the task of adding new content to a\nloosely specified area. Existing methods often require a precise mask or a\ndetailed description of the location, which can be cumbersome and prone to\nerrors. We propose Click2Mask, a novel approach that simplifies the local\nediting process by requiring only a single point of reference (in addition to\nthe content description). A mask is dynamically grown around this point during\na Blended Latent Diffusion (BLD) process, guided by a masked CLIP-based\nsemantic loss. Click2Mask surpasses the limitations of segmentation-based and\nfine-tuning dependent methods, offering a more user-friendly and contextually\naccurate solution. Our experiments demonstrate that Click2Mask not only\nminimizes user effort but also delivers competitive or superior local image\nmanipulation results compared to SoTA methods, according to both human\njudgement and automatic metrics. Key contributions include the simplification\nof user input, the ability to freely add objects unconstrained by existing\nsegments, and the integration potential of our dynamic mask approach within\nother editing methods.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08272","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advancements in generative models have revolutionized image generation
and editing, making these tasks accessible to non-experts. This paper focuses
on local image editing, particularly the task of adding new content to a
loosely specified area. Existing methods often require a precise mask or a
detailed description of the location, which can be cumbersome and prone to
errors. We propose Click2Mask, a novel approach that simplifies the local
editing process by requiring only a single point of reference (in addition to
the content description). A mask is dynamically grown around this point during
a Blended Latent Diffusion (BLD) process, guided by a masked CLIP-based
semantic loss. Click2Mask surpasses the limitations of segmentation-based and
fine-tuning dependent methods, offering a more user-friendly and contextually
accurate solution. Our experiments demonstrate that Click2Mask not only
minimizes user effort but also delivers competitive or superior local image
manipulation results compared to SoTA methods, according to both human
judgement and automatic metrics. Key contributions include the simplification
of user input, the ability to freely add objects unconstrained by existing
segments, and the integration potential of our dynamic mask approach within
other editing methods.