Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN

IF 1.2 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Electrical and Computer Engineering Pub Date : 2023-11-14 DOI:10.1155/2023/6645356
Kidist Alemayehu, Worku Jifara, Demissie Jobir
{"title":"Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN","authors":"Kidist Alemayehu, Worku Jifara, Demissie Jobir","doi":"10.1155/2023/6645356","DOIUrl":null,"url":null,"abstract":"The fundamental challenge in video generation is not only generating high-quality image sequences but also generating consistent frames with no abrupt shifts. With the development of generative adversarial networks (GANs), great progress has been made in image generation tasks which can be used for facial expression synthesis. Most previous works focused on synthesizing frontal and near frontal faces and manual annotation. However, considering only the frontal and near frontal area is not sufficient for many real-world applications, and manual annotation fails when the video is incomplete. AffineGAN, a recent study, uses affine transformation in latent space to automatically infer the expression intensity value; however, this work requires extraction of the feature of the target ground truth image, and the generated sequence of images is also not sufficient. To address these issues, this study is proposed to infer the expression of intensity value automatically without the need to extract the feature of the ground truth images. The local dataset is prepared with frontal and with two different face positions (the left and right sides). Average content distance metrics of the proposed solution along with different experiments have been measured, and the proposed solution has shown improvements. The proposed method has improved the ACD-I of affine GAN from 1.606 ± 0.018 to 1.584 ± 0.00, ACD-C of affine GAN from 1.452 ± 0.008 to 1.430 ± 0.009, and ACD-G of affine GAN from 1.769 ± 0.007 to 1.744 ± 0.01, which is far better than AffineGAN. This work concludes that integrating self-attention into the generator network improves a quality of the generated images sequences. In addition, evenly distributing values based on frame size to assign expression intensity value improves the consistency of image sequences being generated. It also enables the generator to generate different frame size videos while remaining within the range [0, 1].","PeriodicalId":46573,"journal":{"name":"Journal of Electrical and Computer Engineering","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2023/6645356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The fundamental challenge in video generation is not only generating high-quality image sequences but also generating consistent frames with no abrupt shifts. With the development of generative adversarial networks (GANs), great progress has been made in image generation tasks which can be used for facial expression synthesis. Most previous works focused on synthesizing frontal and near frontal faces and manual annotation. However, considering only the frontal and near frontal area is not sufficient for many real-world applications, and manual annotation fails when the video is incomplete. AffineGAN, a recent study, uses affine transformation in latent space to automatically infer the expression intensity value; however, this work requires extraction of the feature of the target ground truth image, and the generated sequence of images is also not sufficient. To address these issues, this study is proposed to infer the expression of intensity value automatically without the need to extract the feature of the ground truth images. The local dataset is prepared with frontal and with two different face positions (the left and right sides). Average content distance metrics of the proposed solution along with different experiments have been measured, and the proposed solution has shown improvements. The proposed method has improved the ACD-I of affine GAN from 1.606 ± 0.018 to 1.584 ± 0.00, ACD-C of affine GAN from 1.452 ± 0.008 to 1.430 ± 0.009, and ACD-G of affine GAN from 1.769 ± 0.007 to 1.744 ± 0.01, which is far better than AffineGAN. This work concludes that integrating self-attention into the generator network improves a quality of the generated images sequences. In addition, evenly distributing values based on frame size to assign expression intensity value improves the consistency of image sequences being generated. It also enables the generator to generate different frame size videos while remaining within the range [0, 1].
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于注意力的图像到视频的面部表情合成
视频生成的根本挑战不仅在于生成高质量的图像序列,而且在于生成无突变的一致帧。随着生成式对抗网络(GANs)的发展,用于人脸表情合成的图像生成任务取得了很大进展。以往的工作大多集中在正面和近正面的合成和人工标注。然而,仅考虑正面和近正面区域对于许多实际应用来说是不够的,并且当视频不完整时,手动注释会失败。AffineGAN是最近的一项研究,利用隐空间的仿射变换自动推断表达强度值;但是,这项工作需要提取目标地真图像的特征,生成的图像序列也不够。针对这些问题,本研究提出在不需要提取地真图像特征的情况下,自动推断强度值的表达式。局部数据集由正面和两个不同的面部位置(左侧和右侧)准备。对所提出的解决方案的平均内容距离度量以及不同的实验进行了测量,并且所提出的解决方案显示出改进。该方法将仿射GAN的ACD-I从1.606±0.018提高到1.584±0.00,将仿射GAN的ACD-C从1.452±0.008提高到1.430±0.009,将仿射GAN的ACD-G从1.769±0.007提高到1.744±0.01,远远优于仿射GAN。这项工作的结论是,将自注意集成到生成器网络中可以提高生成图像序列的质量。此外,基于帧大小均匀分配值来分配表达强度值,提高了生成的图像序列的一致性。它还使生成器能够生成不同帧大小的视频,同时保持在[0,1]范围内。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Electrical and Computer Engineering
Journal of Electrical and Computer Engineering COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
4.20
自引率
0.00%
发文量
152
审稿时长
19 weeks
期刊最新文献
Network Intrusion Detection Using Knapsack Optimization, Mutual Information Gain, and Machine Learning Electronically Tunable Grounded and Floating Capacitance Multipliers Using a Single Active Element A Novel Technique for Facial Recognition Based on the GSO-CNN Deep Learning Algorithm Simulation Analysis of Arc-Quenching Performance of Eco-Friendly Insulating Gas Mixture of CF3I and CO2 under Impulse Arc Balancing Data Privacy and 5G VNFs Security Monitoring: Federated Learning with CNN + BiLSTM + LSTM Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1