{"title":"Wav2Lip-HR: Synthesising clear high-resolution talking head in the wild","authors":"Chao Liang, Qinghua Wang, Yunlin Chen, Minjie Tang","doi":"10.1002/cav.2226","DOIUrl":null,"url":null,"abstract":"<p>Talking head generation aims to synthesize a photo-realistic speaking video with accurate lip motion. While this field has attracted more attention in recent audio-visual researches, most existing methods do not achieve the simultaneous improvement of lip synchronization and visual quality. In this paper, we propose Wav2Lip-HR, a neural-based audio-driven high-resolution talking head generation method. With our technique, all required to generate a clear high-resolution lip sync talking video is an image/video of the target face and an audio clip of any speech. The primary benefit of our method is that it generates clear high-resolution videos with sufficient facial details, rather than the ones just be large-sized with less clarity. We first analyze key factors that limit the clarity of generated videos and then put forth several important solutions to address the problem, including data augmentation, model structure improvement and a more effective loss function. Finally, we employ several efficient metrics to evaluate the clarity of images generated by our proposed approach as well as several widely used metrics to evaluate lip-sync performance. Numerous experiments demonstrate that our method has superior performance on visual quality and lip synchronization when compared to other existing schemes.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2226","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Talking head generation aims to synthesize a photo-realistic speaking video with accurate lip motion. While this field has attracted more attention in recent audio-visual researches, most existing methods do not achieve the simultaneous improvement of lip synchronization and visual quality. In this paper, we propose Wav2Lip-HR, a neural-based audio-driven high-resolution talking head generation method. With our technique, all required to generate a clear high-resolution lip sync talking video is an image/video of the target face and an audio clip of any speech. The primary benefit of our method is that it generates clear high-resolution videos with sufficient facial details, rather than the ones just be large-sized with less clarity. We first analyze key factors that limit the clarity of generated videos and then put forth several important solutions to address the problem, including data augmentation, model structure improvement and a more effective loss function. Finally, we employ several efficient metrics to evaluate the clarity of images generated by our proposed approach as well as several widely used metrics to evaluate lip-sync performance. Numerous experiments demonstrate that our method has superior performance on visual quality and lip synchronization when compared to other existing schemes.
期刊介绍:
With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.