通过文本提示实现图像风格化的局部关注

Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu
{"title":"通过文本提示实现图像风格化的局部关注","authors":"Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu","doi":"10.1007/s00521-024-10394-w","DOIUrl":null,"url":null,"abstract":"<p>Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Local part attention for image stylization with text prompt\",\"authors\":\"Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu\",\"doi\":\"10.1007/s00521-024-10394-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.</p>\",\"PeriodicalId\":18925,\"journal\":{\"name\":\"Neural Computing and Applications\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computing and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00521-024-10394-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-024-10394-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于提示的肖像图像风格转换旨在将输入的内容图像转换为由文字描述的所需风格,而无需风格图像。在许多实际情况下,用户可能不仅关注整个肖像图像,还关注局部(如眼睛、嘴唇和头发)。针对此类应用,我们提出了一种新的框架,可在由所需风格的文字描述所描述的特定区域进行风格转移。具体来说,我们结合了语义分割技术来识别目标区域,而不需要用户提供编辑掩码,同时利用预先训练好的基于 CLIP 的模型来进行风格化。此外,我们还提出了一种文本到补丁的匹配损失方法,即随机将风格化图像分割成更小的补丁,以确保结果质量的一致性。为了全面评估所提出的方法,我们在由 CelebAMask-HQ 数据集和其他相关作品的风格描述组成的数据集上使用了 FID、SSIM 和 PSNR 等多个指标。广泛的实验结果表明,我们的框架在风格化质量和推理时间方面都优于其他最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Local part attention for image stylization with text prompt

Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Neuroevolution gives rise to more focused information transfer compared to backpropagation in recurrent neural networks. Potential analysis of radiographic images to determine infestation of rice seeds Recommendation systems with user and item profiles based on symbolic modal data End-to-end entity extraction from OCRed texts using summarization models Firearm detection using DETR with multiple self-coordinated neural networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1