一个简单的方法,高质量的艺术家驱动的口型

Yuyu Xu, Andrew W. Feng, Ari Shapiro
{"title":"一个简单的方法,高质量的艺术家驱动的口型","authors":"Yuyu Xu, Andrew W. Feng, Ari Shapiro","doi":"10.1145/2448196.2448229","DOIUrl":null,"url":null,"abstract":"Synchronizing the lip and mouth movements naturally along with animation is an important part of convincing 3D character performance. We present a simple, portable and editable lip-synchronization method that works for multiple languages, requires no machine learning, can be constructed by a skilled animator, runs in real time, and can be personalized for each character. Our method associates animation curves designed by an animator on a fixed set of static facial poses, with sequential pairs of phonemes (diphones), and then stitch the diphones together to create a set of curves for the facial poses. Diphone- and triphone-based methods have been explored in various previous works [Deng et al. 2006], often requiring machine learning. However, our experiments have shown that diphones are sufficient for producing high-quality lip syncing, and that longer sequences of phonemes are not necessary. Our experiments have shown that skilled animators can sufficiently generate the data needed for good quality results. Thus our algorithm does not need any specific rules about coarticulation, such as dominance functions [Cohen and Massaro 1993] or language rules. Such rules are implicit within the artist-produced data. In order to produce a tractable set of data, our method reduces the full set of 40 English phonemes to a smaller set of 21, which are then annotated by an animator. Once the full diphone set of animations has been generated, it can be reused for multiple characters. Each additional character requires a small set of eight static poses or blendshapes. In addition, each language requires a new set of diphones, although similar phonemes among languages can share the same diphone curves. We show how to reuse our English diphone set to adapt to a Mandarin diphone set.","PeriodicalId":91160,"journal":{"name":"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games","volume":"3 1","pages":"181"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A simple method for high quality artist-driven lip syncing\",\"authors\":\"Yuyu Xu, Andrew W. Feng, Ari Shapiro\",\"doi\":\"10.1145/2448196.2448229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Synchronizing the lip and mouth movements naturally along with animation is an important part of convincing 3D character performance. We present a simple, portable and editable lip-synchronization method that works for multiple languages, requires no machine learning, can be constructed by a skilled animator, runs in real time, and can be personalized for each character. Our method associates animation curves designed by an animator on a fixed set of static facial poses, with sequential pairs of phonemes (diphones), and then stitch the diphones together to create a set of curves for the facial poses. Diphone- and triphone-based methods have been explored in various previous works [Deng et al. 2006], often requiring machine learning. However, our experiments have shown that diphones are sufficient for producing high-quality lip syncing, and that longer sequences of phonemes are not necessary. Our experiments have shown that skilled animators can sufficiently generate the data needed for good quality results. Thus our algorithm does not need any specific rules about coarticulation, such as dominance functions [Cohen and Massaro 1993] or language rules. Such rules are implicit within the artist-produced data. In order to produce a tractable set of data, our method reduces the full set of 40 English phonemes to a smaller set of 21, which are then annotated by an animator. Once the full diphone set of animations has been generated, it can be reused for multiple characters. Each additional character requires a small set of eight static poses or blendshapes. In addition, each language requires a new set of diphones, although similar phonemes among languages can share the same diphone curves. We show how to reuse our English diphone set to adapt to a Mandarin diphone set.\",\"PeriodicalId\":91160,\"journal\":{\"name\":\"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games\",\"volume\":\"3 1\",\"pages\":\"181\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2448196.2448229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2448196.2448229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

将嘴唇和嘴巴的动作与动画自然同步是令人信服的3D角色表演的重要组成部分。我们提出了一个简单,便携和可编辑的唇同步方法,适用于多种语言,不需要机器学习,可以由熟练的动画师构建,实时运行,并且可以为每个角色个性化。我们的方法将动画师在一组固定的静态面部姿势上设计的动画曲线与连续的音素对(双音素)相关联,然后将这些双音素拼接在一起,为面部姿势创建一组曲线。在以前的各种工作中已经探索了基于Diphone和triphone的方法[Deng et al. 2006],通常需要机器学习。然而,我们的实验表明,双音器足以产生高质量的口型,而更长的音素序列是不必要的。我们的实验表明,熟练的动画师可以充分生成高质量结果所需的数据。因此,我们的算法不需要任何关于协同发音的特定规则,例如优势函数[Cohen and Massaro 1993]或语言规则。这些规则隐含在艺术家制作的数据中。为了生成一组易于处理的数据,我们的方法将40个完整的英语音素减少到21个较小的音素集,然后由动画师注释。一旦生成了完整的diphone动画集,它就可以用于多个角色。每个额外的角色都需要一个由八个静态姿势或混合形状组成的小集合。此外,每种语言都需要一套新的双音素,尽管语言之间相似的音素可以共享相同的双音素曲线。我们展示了如何重新使用我们的英语diphone集来适应普通话diphone集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A simple method for high quality artist-driven lip syncing
Synchronizing the lip and mouth movements naturally along with animation is an important part of convincing 3D character performance. We present a simple, portable and editable lip-synchronization method that works for multiple languages, requires no machine learning, can be constructed by a skilled animator, runs in real time, and can be personalized for each character. Our method associates animation curves designed by an animator on a fixed set of static facial poses, with sequential pairs of phonemes (diphones), and then stitch the diphones together to create a set of curves for the facial poses. Diphone- and triphone-based methods have been explored in various previous works [Deng et al. 2006], often requiring machine learning. However, our experiments have shown that diphones are sufficient for producing high-quality lip syncing, and that longer sequences of phonemes are not necessary. Our experiments have shown that skilled animators can sufficiently generate the data needed for good quality results. Thus our algorithm does not need any specific rules about coarticulation, such as dominance functions [Cohen and Massaro 1993] or language rules. Such rules are implicit within the artist-produced data. In order to produce a tractable set of data, our method reduces the full set of 40 English phonemes to a smaller set of 21, which are then annotated by an animator. Once the full diphone set of animations has been generated, it can be reused for multiple characters. Each additional character requires a small set of eight static poses or blendshapes. In addition, each language requires a new set of diphones, although similar phonemes among languages can share the same diphone curves. We show how to reuse our English diphone set to adapt to a Mandarin diphone set.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Interactive Inverse Spatio-Temporal Crowd Motion Design User-guided 3D reconstruction using multi-view stereo DenseGATs: A Graph-Attention-Based Network for Nonlinear Character Deformation RANDM: Random Access Depth Map Compression Using Range-Partitioning and Global Dictionary The Effect of Lighting, Landmarks and Auditory Cues on Human Performance in Navigating a Virtual Maze
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1