Yan Gan, Chenxue Yang, Mao Ye, Renjie Huang, Deqiang Ouyang
{"title":"Generative Adversarial Networks with Learnable Auxiliary Module for Image Synthesis","authors":"Yan Gan, Chenxue Yang, Mao Ye, Renjie Huang, Deqiang Ouyang","doi":"10.1145/3653021","DOIUrl":null,"url":null,"abstract":"<p>Training generative adversarial networks (GANs) for noise-to-image synthesis is a challenge task, primarily due to the instability of GANs’ training process. One of the key issues is the generator’s sensitivity to input data, which can cause sudden fluctuations in the generator’s loss value with certain inputs. This sensitivity suggests an inadequate ability to resist disturbances in the generator, causing the discriminator’s loss value to oscillate and negatively impacting the discriminator. Then, the negative feedback of discriminator is also not conducive to updating generator’s parameters, leading to suboptimal image generation quality. In response to this challenge, we present an innovative GANs model equipped with a learnable auxiliary module that processes auxiliary noise. The core objective of this module is to enhance the stability of both the generator and discriminator throughout the training process. To achieve this target, we incorporate a learnable auxiliary penalty and an augmented discriminator, designed to control the generator and reinforce the discriminator’s stability, respectively. We further apply our method to the Hinge and LSGANs loss functions, illustrating its efficacy in reducing the instability of both the generator and the discriminator. The tests we conducted on LSUN, CelebA, Market-1501 and Creative Senz3D datasets serve as proof of our method’s ability to improve the training stability and overall performance of the baseline methods.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"26 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653021","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Training generative adversarial networks (GANs) for noise-to-image synthesis is a challenge task, primarily due to the instability of GANs’ training process. One of the key issues is the generator’s sensitivity to input data, which can cause sudden fluctuations in the generator’s loss value with certain inputs. This sensitivity suggests an inadequate ability to resist disturbances in the generator, causing the discriminator’s loss value to oscillate and negatively impacting the discriminator. Then, the negative feedback of discriminator is also not conducive to updating generator’s parameters, leading to suboptimal image generation quality. In response to this challenge, we present an innovative GANs model equipped with a learnable auxiliary module that processes auxiliary noise. The core objective of this module is to enhance the stability of both the generator and discriminator throughout the training process. To achieve this target, we incorporate a learnable auxiliary penalty and an augmented discriminator, designed to control the generator and reinforce the discriminator’s stability, respectively. We further apply our method to the Hinge and LSGANs loss functions, illustrating its efficacy in reducing the instability of both the generator and the discriminator. The tests we conducted on LSUN, CelebA, Market-1501 and Creative Senz3D datasets serve as proof of our method’s ability to improve the training stability and overall performance of the baseline methods.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.