Improved Minimum Phone Error based Discriminative Training of Acoustic Models for Mandarin Large Vocabulary Continuous Speech Recognition

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2008-09-01 DOI:10.30019/IJCLCLP.200809.0005

Shih-Hung Liu, Fang-Hui Chu, Yueng-Tien Lo, Berlin Chen

{"title":"Improved Minimum Phone Error based Discriminative Training of Acoustic Models for Mandarin Large Vocabulary Continuous Speech Recognition","authors":"Shih-Hung Liu, Fang-Hui Chu, Yueng-Tien Lo, Berlin Chen","doi":"10.30019/IJCLCLP.200809.0005","DOIUrl":null,"url":null,"abstract":"This paper considers minimum phone error (MPE) based discriminative training of acoustic models for Mandarin broadcast news recognition. We present a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of MPE training. Moreover, a novel data selection approach based on the frame-level normalized entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance is explored. It has the merit of making the training algorithm focus much more on the training statistics of those frame samples that center nearly around the decision boundary for better discrimination. The underlying characteristics of the presented approaches are extensively investigated, and their performance is verified by comparison with the standard MPE training approach as well as the other related work. Experiments conducted on broadcast news collected in Taiwan demonstrate that the integration of the frame-level phone accuracy calculation and data selection yields slight but consistent improvements over the baseline system.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"305 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.200809.0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper considers minimum phone error (MPE) based discriminative training of acoustic models for Mandarin broadcast news recognition. We present a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of MPE training. Moreover, a novel data selection approach based on the frame-level normalized entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance is explored. It has the merit of making the training algorithm focus much more on the training statistics of those frame samples that center nearly around the decision boundary for better discrimination. The underlying characteristics of the presented approaches are extensively investigated, and their performance is verified by comparison with the standard MPE training approach as well as the other related work. Experiments conducted on broadcast news collected in Taiwan demonstrate that the integration of the frame-level phone accuracy calculation and data selection yields slight but consistent improvements over the baseline system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于声学模型判别训练的普通话大词汇量连续语音识别

本文研究了基于最小电话误差(MPE)的声音模型判别训练方法在普通话广播新闻识别中的应用。我们提出了一种新的基于假设的手机弧线帧级精度的手机精度函数，而不是使用MPE训练的原始手机精度函数。此外，本文还探索了一种基于从训练话语的词格中获得的高斯后验概率的帧级归一化熵的数据选择方法。它的优点是使训练算法更多地关注那些接近决策边界的帧样本的训练统计量，以获得更好的区分。本文对所提出的方法的基本特征进行了广泛的研究，并通过与标准的MPE训练方法以及其他相关工作的比较验证了它们的性能。在台湾收集的广播新闻上进行的实验表明，帧级电话精度计算和数据选择的集成比基线系统产生了轻微但一致的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Int. J. Comput. Linguistics Chin. Lang. Process.

自引率

0.00%

发文量