阿尔菲不花一分钱实现 RGBA 图像生成的民主化

arXiv - CS - Multimedia Pub Date : 2024-08-27 DOI:arxiv-2408.14826

Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

{"title":"阿尔菲不花一分钱实现 RGBA 图像生成的民主化","authors":"Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara","doi":"arxiv-2408.14826","DOIUrl":null,"url":null,"abstract":"Designs and artworks are ubiquitous across various creative fields, requiring\ngraphic design skills and dedicated software to create compositions that\ninclude many graphical elements, such as logos, icons, symbols, and art scenes,\nwhich are integral to visual storytelling. Automating the generation of such\nvisual elements improves graphic designers' productivity, democratizes and\ninnovates the creative industry, and helps generate more realistic synthetic\ndata for related tasks. These illustration elements are mostly RGBA images with\nirregular shapes and cutouts, facilitating blending and scene composition.\nHowever, most image generation models are incapable of generating such images\nand achieving this capability requires expensive computational resources,\nspecific training recipes, or post-processing solutions. In this work, we\npropose a fully-automated approach for obtaining RGBA illustrations by\nmodifying the inference-time behavior of a pre-trained Diffusion Transformer\nmodel, exploiting the prompt-guided controllability and visual quality offered\nby such models with no additional computational cost. We force the generation\nof entire subjects without sharp croppings, whose background is easily removed\nfor seamless integration into design projects or artistic scenes. We show with\na user study that, in most cases, users prefer our solution over generating and\nthen matting an image, and we show that our generated illustrations yield good\nresults when used as inputs for composite scene generation pipelines. We\nrelease the code at https://github.com/aimagelab/Alfie.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Alfie: Democratising RGBA Image Generation With No $$$\",\"authors\":\"Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara\",\"doi\":\"arxiv-2408.14826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designs and artworks are ubiquitous across various creative fields, requiring\\ngraphic design skills and dedicated software to create compositions that\\ninclude many graphical elements, such as logos, icons, symbols, and art scenes,\\nwhich are integral to visual storytelling. Automating the generation of such\\nvisual elements improves graphic designers' productivity, democratizes and\\ninnovates the creative industry, and helps generate more realistic synthetic\\ndata for related tasks. These illustration elements are mostly RGBA images with\\nirregular shapes and cutouts, facilitating blending and scene composition.\\nHowever, most image generation models are incapable of generating such images\\nand achieving this capability requires expensive computational resources,\\nspecific training recipes, or post-processing solutions. In this work, we\\npropose a fully-automated approach for obtaining RGBA illustrations by\\nmodifying the inference-time behavior of a pre-trained Diffusion Transformer\\nmodel, exploiting the prompt-guided controllability and visual quality offered\\nby such models with no additional computational cost. We force the generation\\nof entire subjects without sharp croppings, whose background is easily removed\\nfor seamless integration into design projects or artistic scenes. We show with\\na user study that, in most cases, users prefer our solution over generating and\\nthen matting an image, and we show that our generated illustrations yield good\\nresults when used as inputs for composite scene generation pipelines. We\\nrelease the code at https://github.com/aimagelab/Alfie.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

设计和艺术作品在各个创意领域无处不在，需要图形设计技能和专用软件来创建包含许多图形元素的作品，如徽标、图标、符号和艺术场景，这些元素是视觉叙事不可或缺的组成部分。自动生成这些视觉元素可以提高平面设计师的工作效率，实现创意产业的民主化和创新，并有助于为相关任务生成更逼真的合成数据。然而，大多数图像生成模型都无法生成此类图像，要实现这一功能需要昂贵的计算资源、特定的训练配方或后处理解决方案。在这项工作中，我们提出了一种全自动方法，通过修改预先训练好的扩散变换模型的推理时间行为来获取 RGBA 插图，利用这类模型提供的即时指导可控性和视觉质量，而无需额外的计算成本。我们强制生成没有锐利裁剪的整个主体，其背景可以轻松去除，以便无缝集成到设计项目或艺术场景中。我们通过用户研究表明，在大多数情况下，用户更喜欢我们的解决方案，而不是先生成图像，然后再进行垫底处理，而且我们还表明，我们生成的插图在用作合成场景生成管道的输入时效果良好。代码发布在 https://github.com/aimagelab/Alfie。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Alfie: Democratising RGBA Image Generation With No $$$

Designs and artworks are ubiquitous across various creative fields, requiring graphic design skills and dedicated software to create compositions that include many graphical elements, such as logos, icons, symbols, and art scenes, which are integral to visual storytelling. Automating the generation of such visual elements improves graphic designers' productivity, democratizes and innovates the creative industry, and helps generate more realistic synthetic data for related tasks. These illustration elements are mostly RGBA images with irregular shapes and cutouts, facilitating blending and scene composition. However, most image generation models are incapable of generating such images and achieving this capability requires expensive computational resources, specific training recipes, or post-processing solutions. In this work, we propose a fully-automated approach for obtaining RGBA illustrations by modifying the inference-time behavior of a pre-trained Diffusion Transformer model, exploiting the prompt-guided controllability and visual quality offered by such models with no additional computational cost. We force the generation of entire subjects without sharp croppings, whose background is easily removed for seamless integration into design projects or artistic scenes. We show with a user study that, in most cases, users prefer our solution over generating and then matting an image, and we show that our generated illustrations yield good results when used as inputs for composite scene generation pipelines. We release the code at https://github.com/aimagelab/Alfie.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助