Wenji Yang, Hang An, Wenchao Hu, Xinxin Ma, Liping Xie
{"title":"Text-guided floral image generation based on lightweight deep attention feature fusion GAN","authors":"Wenji Yang, Hang An, Wenchao Hu, Xinxin Ma, Liping Xie","doi":"10.1007/s00371-024-03617-7","DOIUrl":null,"url":null,"abstract":"<p>Generating floral images conditioned on textual descriptions is a highly challenging task. However, most existing text-to-floral image synthesis methods adopt a single-stage generation architecture, which often requires substantial hardware resources, such as large-scale GPU clusters and a large number of training images. Moreover, this architecture tends to lose some detail features when shallow image features are fused with deep image features. To address these challenges, this paper proposes a Lightweight Deep Attention Feature Fusion Generative Adversarial Network for the text-to-floral image generation task. This network performs impressively well even with limited hardware resources. Specifically, we introduce a novel Deep Attention Text-Image Fusion Block that leverages Multi-scale Channel Attention Mechanisms to effectively enhance the capability of displaying details and visual consistency in text-generated floral images. Secondly, we propose a novel Self-Supervised Target-Aware Discriminator capable of learning a richer feature mapping coverage area from input images. This not only aids the generator in creating higher-quality images but also improves the training efficiency of GANs, further reducing resource consumption. Finally, extensive experiments on dataset of three different sample sizes validate the effectiveness of the proposed model. Source code and pretrained models are available at https://github.com/BoomAnm/LDAF-GAN.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03617-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Generating floral images conditioned on textual descriptions is a highly challenging task. However, most existing text-to-floral image synthesis methods adopt a single-stage generation architecture, which often requires substantial hardware resources, such as large-scale GPU clusters and a large number of training images. Moreover, this architecture tends to lose some detail features when shallow image features are fused with deep image features. To address these challenges, this paper proposes a Lightweight Deep Attention Feature Fusion Generative Adversarial Network for the text-to-floral image generation task. This network performs impressively well even with limited hardware resources. Specifically, we introduce a novel Deep Attention Text-Image Fusion Block that leverages Multi-scale Channel Attention Mechanisms to effectively enhance the capability of displaying details and visual consistency in text-generated floral images. Secondly, we propose a novel Self-Supervised Target-Aware Discriminator capable of learning a richer feature mapping coverage area from input images. This not only aids the generator in creating higher-quality images but also improves the training efficiency of GANs, further reducing resource consumption. Finally, extensive experiments on dataset of three different sample sizes validate the effectiveness of the proposed model. Source code and pretrained models are available at https://github.com/BoomAnm/LDAF-GAN.