Generating dynamic lip-syncing using target audio in a multimedia environment

Diksha Pawar, Prashant Borde, Pravin Yannawar
{"title":"Generating dynamic lip-syncing using target audio in a multimedia environment","authors":"Diksha Pawar,&nbsp;Prashant Borde,&nbsp;Pravin Yannawar","doi":"10.1016/j.nlp.2024.100084","DOIUrl":null,"url":null,"abstract":"<div><p>The presented research focuses on the challenging task of creating lip-sync facial videos that align with a specified target speech segment. A novel deep-learning model has been developed to produce precise synthetic lip movements corresponding to the speech extracted from an audio source. Consequently, there are instances where portions of the visual data may fall out of sync with the updated audio and this challenge is handled through, a novel strategy, leveraging insights from a robust lip-sync discriminator. Additionally, this study introduces fresh criteria and evaluation benchmarks for assessing lip synchronization in unconstrained videos. LipChanger demonstrates improved PSNR values, indicative of enhanced image quality. Furthermore, it exhibits highly accurate lip synthesis, as evidenced by lower LMD values and higher SSIM values. These outcomes suggest that the LipChanger approach holds significant potential for enhancing lip synchronization in talking face videos, resulting in more realistic lip movements. The proposed LipChanger model and its associated evaluation benchmarks show promise and could potentially contribute to advancements in lip-sync technology for unconstrained talking face videos.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100084"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000323/pdfft?md5=84516d2e22e4420f113635a3914da66f&pid=1-s2.0-S2949719124000323-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The presented research focuses on the challenging task of creating lip-sync facial videos that align with a specified target speech segment. A novel deep-learning model has been developed to produce precise synthetic lip movements corresponding to the speech extracted from an audio source. Consequently, there are instances where portions of the visual data may fall out of sync with the updated audio and this challenge is handled through, a novel strategy, leveraging insights from a robust lip-sync discriminator. Additionally, this study introduces fresh criteria and evaluation benchmarks for assessing lip synchronization in unconstrained videos. LipChanger demonstrates improved PSNR values, indicative of enhanced image quality. Furthermore, it exhibits highly accurate lip synthesis, as evidenced by lower LMD values and higher SSIM values. These outcomes suggest that the LipChanger approach holds significant potential for enhancing lip synchronization in talking face videos, resulting in more realistic lip movements. The proposed LipChanger model and its associated evaluation benchmarks show promise and could potentially contribute to advancements in lip-sync technology for unconstrained talking face videos.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在多媒体环境中使用目标音频生成动态唇语同步
本文的研究重点是创建与指定目标语音段一致的唇部同步面部视频这一具有挑战性的任务。我们开发了一种新颖的深度学习模型,用于生成与从音频源中提取的语音相对应的精确合成唇部动作。因此,在某些情况下,部分视觉数据可能会与更新的音频不同步,而这一挑战是通过一种新颖的策略,利用强大的唇部同步判别器的洞察力来解决的。此外,这项研究还引入了新的标准和评估基准,用于评估无限制视频中的唇部同步情况。LipChanger 的 PSNR 值得到了改善,表明图像质量得到了提高。此外,LipChanger 还表现出高度准确的唇语合成,较低的 LMD 值和较高的 SSIM 值都证明了这一点。这些结果表明,LipChanger 方法在增强说话脸部视频的唇部同步方面具有巨大潜力,能产生更逼真的唇部动作。所提出的 LipChanger 模型及其相关评估基准显示出良好的前景,有可能推动无约束人脸视频唇部同步技术的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Uzbek language morphology analyser Evaluation of google translate for Mandarin Chinese translation using sentiment and semantic analysis Bridging gaps in natural language processing for Yorùbá: A systematic review of a decade of progress and prospects Llama3SP: A resource-Efficient large language model for agile story point estimation A systematic review of figurative language detection: Methods, challenges, and multilingual perspectives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1