梦想捕手针对语义一致的文本到图像个性化的外观匹配自我关注

ArXiv Pub Date : 2024-02-15 DOI:10.48550/arXiv.2402.09812

Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang

{"title":"梦想捕手针对语义一致的文本到图像个性化的外观匹配自我关注","authors":"Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang","doi":"10.48550/arXiv.2402.09812","DOIUrl":null,"url":null,"abstract":"The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"7 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization\",\"authors\":\"Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang\",\"doi\":\"10.48550/arXiv.2402.09812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.\",\"PeriodicalId\":8425,\"journal\":{\"name\":\"ArXiv\",\"volume\":\"7 9\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2402.09812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.09812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本到图像（T2I）个性化的目标是根据用户提供的参考概念定制扩散模型，生成与目标提示一致的各种概念图像。使用独特文本嵌入来表示参考概念的传统方法往往无法准确模仿参考概念的外观。为解决这一问题，一种解决方案是在目标去噪过程中明确调节参考图像，即所谓的键值替换。然而，之前的工作仅限于局部编辑，因为它们会破坏预训练 T2I 模型的结构路径。为了克服这一问题，我们提出了一种名为 DreamMatcher 的新颖插件方法，它将 T2I 个性化重新表述为语义匹配。具体来说，DreamMatcher 将目标值替换为通过语义匹配对齐的参考值，同时保持结构路径不变，以保留预训练 T2I 模型生成多样化结构的通用能力。我们还引入了语义一致的屏蔽策略，将个性化概念与目标提示引入的无关区域隔离开来。DreamMatcher 与现有的 T2I 模型兼容，在复杂场景中表现出显著的改进。深入分析证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ArXiv

自引率

0.00%

发文量