利用AAM和MAF对发音障碍患者进行多模态语音识别

2010 IEEE International Workshop on Multimedia Signal Processing Pub Date : 2010-12-10 DOI:10.1109/MMSP.2010.5662075

Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li

{"title":"利用AAM和MAF对发音障碍患者进行多模态语音识别","authors":"Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li","doi":"10.1109/MMSP.2010.5662075","DOIUrl":null,"url":null,"abstract":"We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Multimodal speech recognition of a person with articulation disorders using AAM and MAF\",\"authors\":\"Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li\",\"doi\":\"10.1109/MMSP.2010.5662075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.\",\"PeriodicalId\":105774,\"journal\":{\"name\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMSP.2010.5662075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2010.5662075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

我们调查了一个人的语音识别与衔接障碍导致的动脉样脑瘫。由于言语相关肌肉的紧张，言语的发音会变得不稳定，从而导致言语识别能力的下降。因此，我们使用多声框架(MAF)作为声学特征来解决这个问题。此外，在真实环境中，由于噪声的影响，当前的语音识别系统没有足够的性能。除了声学特征外，视觉特征还用于提高真实环境中的噪声鲁棒性。然而，脑瘫患者往往会不规律地移动头部，从而导致识别问题。我们研究了一种使用主动外观模型(AAM)的姿态鲁棒性视听语音识别方法，以解决由动脉样脑瘫引起的发音障碍患者的这一问题。aam用于人脸跟踪，提取具有姿态鲁棒性的人脸特征点。通过对发音障碍患者嘈杂言语的识别实验，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multimodal speech recognition of a person with articulation disorders using AAM and MAF

We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Workshop on Multimedia Signal Processing

自引率

0.00%

发文量

期刊最新文献

Probabilistic framework for template-based chord recognition A comparative study between different pre-whitening decorrelation based acoustic feedback cancellers Efficient error control in 3D mesh coding An improved foresighted resource reciprocation strategy for multimedia streaming applications Fusion of active and passive sensors for fast 3D capture