{"title":"一个简单的方法,高质量的艺术家驱动的口型","authors":"Yuyu Xu, Andrew W. Feng, Ari Shapiro","doi":"10.1145/2448196.2448229","DOIUrl":null,"url":null,"abstract":"Synchronizing the lip and mouth movements naturally along with animation is an important part of convincing 3D character performance. We present a simple, portable and editable lip-synchronization method that works for multiple languages, requires no machine learning, can be constructed by a skilled animator, runs in real time, and can be personalized for each character. Our method associates animation curves designed by an animator on a fixed set of static facial poses, with sequential pairs of phonemes (diphones), and then stitch the diphones together to create a set of curves for the facial poses. Diphone- and triphone-based methods have been explored in various previous works [Deng et al. 2006], often requiring machine learning. However, our experiments have shown that diphones are sufficient for producing high-quality lip syncing, and that longer sequences of phonemes are not necessary. Our experiments have shown that skilled animators can sufficiently generate the data needed for good quality results. Thus our algorithm does not need any specific rules about coarticulation, such as dominance functions [Cohen and Massaro 1993] or language rules. Such rules are implicit within the artist-produced data. In order to produce a tractable set of data, our method reduces the full set of 40 English phonemes to a smaller set of 21, which are then annotated by an animator. Once the full diphone set of animations has been generated, it can be reused for multiple characters. Each additional character requires a small set of eight static poses or blendshapes. In addition, each language requires a new set of diphones, although similar phonemes among languages can share the same diphone curves. We show how to reuse our English diphone set to adapt to a Mandarin diphone set.","PeriodicalId":91160,"journal":{"name":"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games","volume":"3 1","pages":"181"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A simple method for high quality artist-driven lip syncing\",\"authors\":\"Yuyu Xu, Andrew W. Feng, Ari Shapiro\",\"doi\":\"10.1145/2448196.2448229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Synchronizing the lip and mouth movements naturally along with animation is an important part of convincing 3D character performance. We present a simple, portable and editable lip-synchronization method that works for multiple languages, requires no machine learning, can be constructed by a skilled animator, runs in real time, and can be personalized for each character. Our method associates animation curves designed by an animator on a fixed set of static facial poses, with sequential pairs of phonemes (diphones), and then stitch the diphones together to create a set of curves for the facial poses. Diphone- and triphone-based methods have been explored in various previous works [Deng et al. 2006], often requiring machine learning. However, our experiments have shown that diphones are sufficient for producing high-quality lip syncing, and that longer sequences of phonemes are not necessary. Our experiments have shown that skilled animators can sufficiently generate the data needed for good quality results. Thus our algorithm does not need any specific rules about coarticulation, such as dominance functions [Cohen and Massaro 1993] or language rules. Such rules are implicit within the artist-produced data. In order to produce a tractable set of data, our method reduces the full set of 40 English phonemes to a smaller set of 21, which are then annotated by an animator. Once the full diphone set of animations has been generated, it can be reused for multiple characters. Each additional character requires a small set of eight static poses or blendshapes. In addition, each language requires a new set of diphones, although similar phonemes among languages can share the same diphone curves. We show how to reuse our English diphone set to adapt to a Mandarin diphone set.\",\"PeriodicalId\":91160,\"journal\":{\"name\":\"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games\",\"volume\":\"3 1\",\"pages\":\"181\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2448196.2448229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2448196.2448229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
将嘴唇和嘴巴的动作与动画自然同步是令人信服的3D角色表演的重要组成部分。我们提出了一个简单,便携和可编辑的唇同步方法,适用于多种语言,不需要机器学习,可以由熟练的动画师构建,实时运行,并且可以为每个角色个性化。我们的方法将动画师在一组固定的静态面部姿势上设计的动画曲线与连续的音素对(双音素)相关联,然后将这些双音素拼接在一起,为面部姿势创建一组曲线。在以前的各种工作中已经探索了基于Diphone和triphone的方法[Deng et al. 2006],通常需要机器学习。然而,我们的实验表明,双音器足以产生高质量的口型,而更长的音素序列是不必要的。我们的实验表明,熟练的动画师可以充分生成高质量结果所需的数据。因此,我们的算法不需要任何关于协同发音的特定规则,例如优势函数[Cohen and Massaro 1993]或语言规则。这些规则隐含在艺术家制作的数据中。为了生成一组易于处理的数据,我们的方法将40个完整的英语音素减少到21个较小的音素集,然后由动画师注释。一旦生成了完整的diphone动画集,它就可以用于多个角色。每个额外的角色都需要一个由八个静态姿势或混合形状组成的小集合。此外,每种语言都需要一套新的双音素,尽管语言之间相似的音素可以共享相同的双音素曲线。我们展示了如何重新使用我们的英语diphone集来适应普通话diphone集。
A simple method for high quality artist-driven lip syncing
Synchronizing the lip and mouth movements naturally along with animation is an important part of convincing 3D character performance. We present a simple, portable and editable lip-synchronization method that works for multiple languages, requires no machine learning, can be constructed by a skilled animator, runs in real time, and can be personalized for each character. Our method associates animation curves designed by an animator on a fixed set of static facial poses, with sequential pairs of phonemes (diphones), and then stitch the diphones together to create a set of curves for the facial poses. Diphone- and triphone-based methods have been explored in various previous works [Deng et al. 2006], often requiring machine learning. However, our experiments have shown that diphones are sufficient for producing high-quality lip syncing, and that longer sequences of phonemes are not necessary. Our experiments have shown that skilled animators can sufficiently generate the data needed for good quality results. Thus our algorithm does not need any specific rules about coarticulation, such as dominance functions [Cohen and Massaro 1993] or language rules. Such rules are implicit within the artist-produced data. In order to produce a tractable set of data, our method reduces the full set of 40 English phonemes to a smaller set of 21, which are then annotated by an animator. Once the full diphone set of animations has been generated, it can be reused for multiple characters. Each additional character requires a small set of eight static poses or blendshapes. In addition, each language requires a new set of diphones, although similar phonemes among languages can share the same diphone curves. We show how to reuse our English diphone set to adapt to a Mandarin diphone set.