{"title":"阿尔菲不花一分钱实现 RGBA 图像生成的民主化","authors":"Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara","doi":"arxiv-2408.14826","DOIUrl":null,"url":null,"abstract":"Designs and artworks are ubiquitous across various creative fields, requiring\ngraphic design skills and dedicated software to create compositions that\ninclude many graphical elements, such as logos, icons, symbols, and art scenes,\nwhich are integral to visual storytelling. Automating the generation of such\nvisual elements improves graphic designers' productivity, democratizes and\ninnovates the creative industry, and helps generate more realistic synthetic\ndata for related tasks. These illustration elements are mostly RGBA images with\nirregular shapes and cutouts, facilitating blending and scene composition.\nHowever, most image generation models are incapable of generating such images\nand achieving this capability requires expensive computational resources,\nspecific training recipes, or post-processing solutions. In this work, we\npropose a fully-automated approach for obtaining RGBA illustrations by\nmodifying the inference-time behavior of a pre-trained Diffusion Transformer\nmodel, exploiting the prompt-guided controllability and visual quality offered\nby such models with no additional computational cost. We force the generation\nof entire subjects without sharp croppings, whose background is easily removed\nfor seamless integration into design projects or artistic scenes. We show with\na user study that, in most cases, users prefer our solution over generating and\nthen matting an image, and we show that our generated illustrations yield good\nresults when used as inputs for composite scene generation pipelines. We\nrelease the code at https://github.com/aimagelab/Alfie.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Alfie: Democratising RGBA Image Generation With No $$$\",\"authors\":\"Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara\",\"doi\":\"arxiv-2408.14826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designs and artworks are ubiquitous across various creative fields, requiring\\ngraphic design skills and dedicated software to create compositions that\\ninclude many graphical elements, such as logos, icons, symbols, and art scenes,\\nwhich are integral to visual storytelling. Automating the generation of such\\nvisual elements improves graphic designers' productivity, democratizes and\\ninnovates the creative industry, and helps generate more realistic synthetic\\ndata for related tasks. These illustration elements are mostly RGBA images with\\nirregular shapes and cutouts, facilitating blending and scene composition.\\nHowever, most image generation models are incapable of generating such images\\nand achieving this capability requires expensive computational resources,\\nspecific training recipes, or post-processing solutions. In this work, we\\npropose a fully-automated approach for obtaining RGBA illustrations by\\nmodifying the inference-time behavior of a pre-trained Diffusion Transformer\\nmodel, exploiting the prompt-guided controllability and visual quality offered\\nby such models with no additional computational cost. We force the generation\\nof entire subjects without sharp croppings, whose background is easily removed\\nfor seamless integration into design projects or artistic scenes. We show with\\na user study that, in most cases, users prefer our solution over generating and\\nthen matting an image, and we show that our generated illustrations yield good\\nresults when used as inputs for composite scene generation pipelines. We\\nrelease the code at https://github.com/aimagelab/Alfie.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Alfie: Democratising RGBA Image Generation With No $$$
Designs and artworks are ubiquitous across various creative fields, requiring
graphic design skills and dedicated software to create compositions that
include many graphical elements, such as logos, icons, symbols, and art scenes,
which are integral to visual storytelling. Automating the generation of such
visual elements improves graphic designers' productivity, democratizes and
innovates the creative industry, and helps generate more realistic synthetic
data for related tasks. These illustration elements are mostly RGBA images with
irregular shapes and cutouts, facilitating blending and scene composition.
However, most image generation models are incapable of generating such images
and achieving this capability requires expensive computational resources,
specific training recipes, or post-processing solutions. In this work, we
propose a fully-automated approach for obtaining RGBA illustrations by
modifying the inference-time behavior of a pre-trained Diffusion Transformer
model, exploiting the prompt-guided controllability and visual quality offered
by such models with no additional computational cost. We force the generation
of entire subjects without sharp croppings, whose background is easily removed
for seamless integration into design projects or artistic scenes. We show with
a user study that, in most cases, users prefer our solution over generating and
then matting an image, and we show that our generated illustrations yield good
results when used as inputs for composite scene generation pipelines. We
release the code at https://github.com/aimagelab/Alfie.