阿尔菲不花一分钱实现 RGBA 图像生成的民主化

Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara
{"title":"阿尔菲不花一分钱实现 RGBA 图像生成的民主化","authors":"Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara","doi":"arxiv-2408.14826","DOIUrl":null,"url":null,"abstract":"Designs and artworks are ubiquitous across various creative fields, requiring\ngraphic design skills and dedicated software to create compositions that\ninclude many graphical elements, such as logos, icons, symbols, and art scenes,\nwhich are integral to visual storytelling. Automating the generation of such\nvisual elements improves graphic designers' productivity, democratizes and\ninnovates the creative industry, and helps generate more realistic synthetic\ndata for related tasks. These illustration elements are mostly RGBA images with\nirregular shapes and cutouts, facilitating blending and scene composition.\nHowever, most image generation models are incapable of generating such images\nand achieving this capability requires expensive computational resources,\nspecific training recipes, or post-processing solutions. In this work, we\npropose a fully-automated approach for obtaining RGBA illustrations by\nmodifying the inference-time behavior of a pre-trained Diffusion Transformer\nmodel, exploiting the prompt-guided controllability and visual quality offered\nby such models with no additional computational cost. We force the generation\nof entire subjects without sharp croppings, whose background is easily removed\nfor seamless integration into design projects or artistic scenes. We show with\na user study that, in most cases, users prefer our solution over generating and\nthen matting an image, and we show that our generated illustrations yield good\nresults when used as inputs for composite scene generation pipelines. We\nrelease the code at https://github.com/aimagelab/Alfie.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Alfie: Democratising RGBA Image Generation With No $$$\",\"authors\":\"Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara\",\"doi\":\"arxiv-2408.14826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designs and artworks are ubiquitous across various creative fields, requiring\\ngraphic design skills and dedicated software to create compositions that\\ninclude many graphical elements, such as logos, icons, symbols, and art scenes,\\nwhich are integral to visual storytelling. Automating the generation of such\\nvisual elements improves graphic designers' productivity, democratizes and\\ninnovates the creative industry, and helps generate more realistic synthetic\\ndata for related tasks. These illustration elements are mostly RGBA images with\\nirregular shapes and cutouts, facilitating blending and scene composition.\\nHowever, most image generation models are incapable of generating such images\\nand achieving this capability requires expensive computational resources,\\nspecific training recipes, or post-processing solutions. In this work, we\\npropose a fully-automated approach for obtaining RGBA illustrations by\\nmodifying the inference-time behavior of a pre-trained Diffusion Transformer\\nmodel, exploiting the prompt-guided controllability and visual quality offered\\nby such models with no additional computational cost. We force the generation\\nof entire subjects without sharp croppings, whose background is easily removed\\nfor seamless integration into design projects or artistic scenes. We show with\\na user study that, in most cases, users prefer our solution over generating and\\nthen matting an image, and we show that our generated illustrations yield good\\nresults when used as inputs for composite scene generation pipelines. We\\nrelease the code at https://github.com/aimagelab/Alfie.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

设计和艺术作品在各个创意领域无处不在,需要图形设计技能和专用软件来创建包含许多图形元素的作品,如徽标、图标、符号和艺术场景,这些元素是视觉叙事不可或缺的组成部分。自动生成这些视觉元素可以提高平面设计师的工作效率,实现创意产业的民主化和创新,并有助于为相关任务生成更逼真的合成数据。然而,大多数图像生成模型都无法生成此类图像,要实现这一功能需要昂贵的计算资源、特定的训练配方或后处理解决方案。在这项工作中,我们提出了一种全自动方法,通过修改预先训练好的扩散变换模型的推理时间行为来获取 RGBA 插图,利用这类模型提供的即时指导可控性和视觉质量,而无需额外的计算成本。我们强制生成没有锐利裁剪的整个主体,其背景可以轻松去除,以便无缝集成到设计项目或艺术场景中。我们通过用户研究表明,在大多数情况下,用户更喜欢我们的解决方案,而不是先生成图像,然后再进行垫底处理,而且我们还表明,我们生成的插图在用作合成场景生成管道的输入时效果良好。代码发布在 https://github.com/aimagelab/Alfie。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Alfie: Democratising RGBA Image Generation With No $$$
Designs and artworks are ubiquitous across various creative fields, requiring graphic design skills and dedicated software to create compositions that include many graphical elements, such as logos, icons, symbols, and art scenes, which are integral to visual storytelling. Automating the generation of such visual elements improves graphic designers' productivity, democratizes and innovates the creative industry, and helps generate more realistic synthetic data for related tasks. These illustration elements are mostly RGBA images with irregular shapes and cutouts, facilitating blending and scene composition. However, most image generation models are incapable of generating such images and achieving this capability requires expensive computational resources, specific training recipes, or post-processing solutions. In this work, we propose a fully-automated approach for obtaining RGBA illustrations by modifying the inference-time behavior of a pre-trained Diffusion Transformer model, exploiting the prompt-guided controllability and visual quality offered by such models with no additional computational cost. We force the generation of entire subjects without sharp croppings, whose background is easily removed for seamless integration into design projects or artistic scenes. We show with a user study that, in most cases, users prefer our solution over generating and then matting an image, and we show that our generated illustrations yield good results when used as inputs for composite scene generation pipelines. We release the code at https://github.com/aimagelab/Alfie.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1