基于Kullback-Leibler散度的非母语语音识别声学建模

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI:10.1109/ASRU.2011.6163956

David Imseng, Ramya Rasipuram, M. Magimai.-Doss

{"title":"基于Kullback-Leibler散度的非母语语音识别声学建模","authors":"David Imseng, Ramya Rasipuram, M. Magimai.-Doss","doi":"10.1109/ASRU.2011.6163956","DOIUrl":null,"url":null,"abstract":"One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multi-accented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":"{\"title\":\"Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition\",\"authors\":\"David Imseng, Ramya Rasipuram, M. Magimai.-Doss\",\"doi\":\"10.1109/ASRU.2011.6163956\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multi-accented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.\",\"PeriodicalId\":338241,\"journal\":{\"name\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"27\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2011.6163956\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

摘要

非母语语音识别面临的主要挑战之一是如何在有限的训练数据下处理多口音非母语语音中的声学变异性。在本文中，我们研究了一种通过使用基于Kullback-Leibler散度的隐马尔可夫模型(KL-HMM)来解决这一挑战的方法。更准确地说，多重音语音中的声学变异性是通过使用多语言音素后验概率来处理的，由辅助数据训练的多层感知器估计，作为KL-HMM系统的输入特征。在训练数据有限的情况下，我们利用KL-HMM系统参数较少的优势，建立了更好的声学模型。在HIWIRE语料库上，该方法在149分钟的训练数据下产生1.9%的词错误率(WER)，在2分钟的训练数据下产生5.5%的词错误率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition

One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multi-accented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量

期刊最新文献

Applying feature bagging for more accurate and robust automated speaking assessment Towards choosing better primes for spoken dialog systems Accent level adjustment in bilingual Thai-English text-to-speech synthesis Fast speaker diarization using a high-level scripting language Evaluating prosodic features for automated scoring of non-native read speech