语音动画的音素级发音器动态

2011 International Conference on Asian Language Processing Pub Date : 2011-11-15 DOI:10.1109/IALP.2011.13

Sheng Li, Lan Wang, En Qi

{"title":"语音动画的音素级发音器动态","authors":"Sheng Li, Lan Wang, En Qi","doi":"10.1109/IALP.2011.13","DOIUrl":null,"url":null,"abstract":"Speech visualization can be extended to a task of pronunciation animation for language learners. In this paper, a three dimensional English articulation database is recorded using Carstens Electro-Magnetic Articulograph (EMA AG500). An HMM-based visual synthesis method for continuous speech is implemented to recover 3D articulatory information. The synthesized articulations are then compared to the EMA recordings for objective evaluation. Using a data-driven 3D talking head, the distinctions between the confusable phonemes can be depicted through both external and internal articulatory movements. The experiments have demonstrated that the HMM-based synthesis with limited training data can achieve the minimum RMS error of less than 2mm. The synthesized articulatory movements can be used for computer assisted pronunciation training.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"The Phoneme-Level Articulator Dynamics for Pronunciation Animation\",\"authors\":\"Sheng Li, Lan Wang, En Qi\",\"doi\":\"10.1109/IALP.2011.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech visualization can be extended to a task of pronunciation animation for language learners. In this paper, a three dimensional English articulation database is recorded using Carstens Electro-Magnetic Articulograph (EMA AG500). An HMM-based visual synthesis method for continuous speech is implemented to recover 3D articulatory information. The synthesized articulations are then compared to the EMA recordings for objective evaluation. Using a data-driven 3D talking head, the distinctions between the confusable phonemes can be depicted through both external and internal articulatory movements. The experiments have demonstrated that the HMM-based synthesis with limited training data can achieve the minimum RMS error of less than 2mm. The synthesized articulatory movements can be used for computer assisted pronunciation training.\",\"PeriodicalId\":297167,\"journal\":{\"name\":\"2011 International Conference on Asian Language Processing\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2011.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2011.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

语音可视化可以扩展为语言学习者的发音动画任务。本文用Carstens电磁发音仪(EMA AG500)记录了一个三维英语发音数据库。实现了一种基于hmm的连续语音视觉合成方法，以恢复三维发音信息。然后将合成的关节与EMA记录进行比较以进行客观评价。使用数据驱动的3D说话头，可以通过外部和内部发音运动来描绘容易混淆的音素之间的区别。实验表明，在训练数据有限的情况下，基于hmm的合成可以实现最小均方根误差小于2mm。合成的发音动作可用于计算机辅助发音训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Phoneme-Level Articulator Dynamics for Pronunciation Animation

Speech visualization can be extended to a task of pronunciation animation for language learners. In this paper, a three dimensional English articulation database is recorded using Carstens Electro-Magnetic Articulograph (EMA AG500). An HMM-based visual synthesis method for continuous speech is implemented to recover 3D articulatory information. The synthesized articulations are then compared to the EMA recordings for objective evaluation. Using a data-driven 3D talking head, the distinctions between the confusable phonemes can be depicted through both external and internal articulatory movements. The experiments have demonstrated that the HMM-based synthesis with limited training data can achieve the minimum RMS error of less than 2mm. The synthesized articulatory movements can be used for computer assisted pronunciation training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Asian Language Processing

自引率

0.00%

发文量

期刊最新文献

An Automatic Linguistics Approach for Persian Document Summarization Research on the Uyghur Information Database for Information Processing Research on Multi-document Summarization Model Based on Dynamic Manifold-Ranking Mining Parallel Data from Comparable Corpora via Triangulation A Query Reformulation Model Using Markov Graphic Method