{"title":"为多媒体内容创作中的人类主体生成多样化的逼真人脸图像 GAN","authors":"Lalit Kumar, Dushyant Kumar Singh","doi":"10.1002/cav.2232","DOIUrl":null,"url":null,"abstract":"<p>Face image generation plays an important role in generating innovative and unique multimedia content using the GAN model. With these qualities of the GAN model, they have numerous challenges in the human face image generation. The problems encountered in the generation of facial images are like blurriness in images, incomplete details in the generated facial images, high computational power requirements, and so forth. In this manuscript, we proposed a GAN model that utilizes the composite strength of VGG-16 and ResNet-50's models to overcome those difficulties. It uses VGG-16 to build a discriminator model to discriminate between real and fake images. The generator model utilizes a combination of components from the ResNet-50 and VGG-16 models to enhance the image generation process at each iteration, resulting in the creation of realistic face images. The proposed DRFI GAN (Diversified and Realistic Face Image Generation GAN) model's generator achieves an impressive low FID score of 20.50, which is less than existing state-of-the-art approaches. Furthermore, our findings indicate that the images generated by the DRFI GAN model exhibit 10%–15% greater efficiency and realism with reduced training time compared to existing state-of-the-art methods with lower FID scores.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 2","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diversified realistic face image generation GAN for human subjects in multimedia content creation\",\"authors\":\"Lalit Kumar, Dushyant Kumar Singh\",\"doi\":\"10.1002/cav.2232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Face image generation plays an important role in generating innovative and unique multimedia content using the GAN model. With these qualities of the GAN model, they have numerous challenges in the human face image generation. The problems encountered in the generation of facial images are like blurriness in images, incomplete details in the generated facial images, high computational power requirements, and so forth. In this manuscript, we proposed a GAN model that utilizes the composite strength of VGG-16 and ResNet-50's models to overcome those difficulties. It uses VGG-16 to build a discriminator model to discriminate between real and fake images. The generator model utilizes a combination of components from the ResNet-50 and VGG-16 models to enhance the image generation process at each iteration, resulting in the creation of realistic face images. The proposed DRFI GAN (Diversified and Realistic Face Image Generation GAN) model's generator achieves an impressive low FID score of 20.50, which is less than existing state-of-the-art approaches. Furthermore, our findings indicate that the images generated by the DRFI GAN model exhibit 10%–15% greater efficiency and realism with reduced training time compared to existing state-of-the-art methods with lower FID scores.</p>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"35 2\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.2232\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2232","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
摘要
人脸图像生成在利用 GAN 模型生成新颖独特的多媒体内容方面发挥着重要作用。由于 GAN 模型的这些特性,它们在人脸图像生成方面面临着许多挑战。人脸图像生成过程中遇到的问题包括图像模糊、生成的人脸图像细节不完整、计算能力要求高等。在本手稿中,我们提出了一种 GAN 模型,利用 VGG-16 和 ResNet-50 模型的复合优势来克服这些困难。它利用 VGG-16 建立一个鉴别器模型来区分真假图像。生成器模型利用 ResNet-50 模型和 VGG-16 模型的组件组合来增强每次迭代的图像生成过程,从而生成逼真的人脸图像。所提出的 DRFI GAN(多元化真实人脸图像生成 GAN)模型的生成器实现了令人印象深刻的 20.50 分的低 FID 分数,低于现有的最先进方法。此外,我们的研究结果表明,与 FID 分数较低的现有先进方法相比,DRFI GAN 模型生成图像的效率和逼真度提高了 10%-15%,训练时间也缩短了。
Diversified realistic face image generation GAN for human subjects in multimedia content creation
Face image generation plays an important role in generating innovative and unique multimedia content using the GAN model. With these qualities of the GAN model, they have numerous challenges in the human face image generation. The problems encountered in the generation of facial images are like blurriness in images, incomplete details in the generated facial images, high computational power requirements, and so forth. In this manuscript, we proposed a GAN model that utilizes the composite strength of VGG-16 and ResNet-50's models to overcome those difficulties. It uses VGG-16 to build a discriminator model to discriminate between real and fake images. The generator model utilizes a combination of components from the ResNet-50 and VGG-16 models to enhance the image generation process at each iteration, resulting in the creation of realistic face images. The proposed DRFI GAN (Diversified and Realistic Face Image Generation GAN) model's generator achieves an impressive low FID score of 20.50, which is less than existing state-of-the-art approaches. Furthermore, our findings indicate that the images generated by the DRFI GAN model exhibit 10%–15% greater efficiency and realism with reduced training time compared to existing state-of-the-art methods with lower FID scores.
期刊介绍:
With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.