Robust speech recognition with on-line unsupervised acoustic feature compensation

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI:10.1109/ASRU.2007.4430092

L. Buera, A. Miguel, EDUARDO LLEIDA SOLANO, Oscar Saz-Torralba, A. Ortega

{"title":"Robust speech recognition with on-line unsupervised acoustic feature compensation","authors":"L. Buera, A. Miguel, EDUARDO LLEIDA SOLANO, Oscar Saz-Torralba, A. Ortega","doi":"10.1109/ASRU.2007.4430092","DOIUrl":null,"url":null,"abstract":"An on-line unsupervised hybrid compensation technique is proposed to reduce the mismatch between training and testing conditions. It combines multi-environment model based linear normalization with cross-probability model based on GMMs (MEMLIN CPM) with a novel acoustic model adaptation method based on rotation transformations. Hence, a set of rotation transformations is estimated with clean and MEMLIN CPM-normalized training data by linear regression in an unsupervised process. Thus, in testing, each MEMLIN CPM normalized frame is decoded using a modified Viterbi algorithm and expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. To test the proposed solution, some experiments with Spanish SpeechDat Car database were carried out. MEMLIN CPM over standard ETSI front-end parameters reaches 83.89% of average improvement in WER, while the introduced hybrid solution goes up to 92.07%. Also, the proposed hybrid technique was tested with Aurora 2 database, obtaining an average improvement of 68.88% with clean training.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

An on-line unsupervised hybrid compensation technique is proposed to reduce the mismatch between training and testing conditions. It combines multi-environment model based linear normalization with cross-probability model based on GMMs (MEMLIN CPM) with a novel acoustic model adaptation method based on rotation transformations. Hence, a set of rotation transformations is estimated with clean and MEMLIN CPM-normalized training data by linear regression in an unsupervised process. Thus, in testing, each MEMLIN CPM normalized frame is decoded using a modified Viterbi algorithm and expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. To test the proposed solution, some experiments with Spanish SpeechDat Car database were carried out. MEMLIN CPM over standard ETSI front-end parameters reaches 83.89% of average improvement in WER, while the introduced hybrid solution goes up to 92.07%. Also, the proposed hybrid technique was tested with Aurora 2 database, obtaining an average improvement of 68.88% with clean training.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于在线无监督声学特征补偿的鲁棒语音识别

为了减少训练条件和测试条件之间的不匹配，提出了一种在线无监督混合补偿技术。将基于多环境模型的线性归一化与基于GMMs的交叉概率模型(MEMLIN CPM)相结合，提出了一种基于旋转变换的声学模型自适应方法。因此，在无监督过程中，使用clean和MEMLIN cpm归一化训练数据通过线性回归估计一组旋转变换。因此，在测试中，使用改进的Viterbi算法和扩展的声学模型对每个MEMLIN CPM归一化帧进行解码，这些声学模型是由参考模型和旋转变换集获得的。为了验证所提出的解决方案，在西班牙语语音数据库上进行了一些实验。MEMLIN在标准ETSI前端参数上的CPM达到了WER平均改进的83.89%，而引入混合方案的CPM达到了92.07%。在Aurora 2数据库中进行了混合技术的测试，经过清洁训练，平均提高了68.88%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

自引率

0.00%

发文量