Bing Yang , Xueqin Xiang , Wanzeng Kong , Jianhai Zhang , Jinliang Yao
{"title":"SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis","authors":"Bing Yang , Xueqin Xiang , Wanzeng Kong , Jianhai Zhang , Jinliang Yao","doi":"10.1016/j.eswa.2024.125583","DOIUrl":null,"url":null,"abstract":"<div><div>Text-to-image synthesis aims to generate high-quality realistic images conditioned on text description. The major challenge of this task rests on the deep and seamless integration of text and image features. Therefore, in this paper, we present a novel approach, e.g., semantic fusion generative adversarial networks (SF-GAN), for fine-grained text-to-image generation, which enables efficient semantic interactions. Specifically, our proposed SF-GAN leverages a novel recurrent semantic fusion network to seamlessly manipulate the global allocation of text information across discrete fusion blocks. Moreover, with the usage of the contrastive loss and the dynamic convolution, SF-GAN could fuse the text and image information more accurately and further improve the semantic consistency in the generate stage. During the discrimination stage, we introduce a word-level discriminator designed to offer the generator precise feedback pertaining to each individual word. When compared to current state-of-the-art techniques, our SF-GAN demonstrates remarkable efficiency in generating realistic and text-aligned images, outperforming its contemporaries on challenging benchmark datasets.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424024503","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Text-to-image synthesis aims to generate high-quality realistic images conditioned on text description. The major challenge of this task rests on the deep and seamless integration of text and image features. Therefore, in this paper, we present a novel approach, e.g., semantic fusion generative adversarial networks (SF-GAN), for fine-grained text-to-image generation, which enables efficient semantic interactions. Specifically, our proposed SF-GAN leverages a novel recurrent semantic fusion network to seamlessly manipulate the global allocation of text information across discrete fusion blocks. Moreover, with the usage of the contrastive loss and the dynamic convolution, SF-GAN could fuse the text and image information more accurately and further improve the semantic consistency in the generate stage. During the discrimination stage, we introduce a word-level discriminator designed to offer the generator precise feedback pertaining to each individual word. When compared to current state-of-the-art techniques, our SF-GAN demonstrates remarkable efficiency in generating realistic and text-aligned images, outperforming its contemporaries on challenging benchmark datasets.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.