使用约束MLLR变换改进基于gmm的语言识别

2008 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2008-05-12 DOI:10.1109/ICASSP.2008.4518568

Wade Shen, D. Reynolds

{"title":"使用约束MLLR变换改进基于gmm的语言识别","authors":"Wade Shen, D. Reynolds","doi":"10.1109/ICASSP.2008.4518568","DOIUrl":null,"url":null,"abstract":"In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Improved GMM-based language recognition using constrained MLLR transforms\",\"authors\":\"Wade Shen, D. Reynolds\",\"doi\":\"10.1109/ICASSP.2008.4518568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.\",\"PeriodicalId\":333742,\"journal\":{\"name\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2008.4518568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2008.4518568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

在本文中，我们描述了基于约束最大似然线性回归的特征空间变换在语言识别问题中的应用，用于信道和说话人可变性的无监督补偿。我们表明，在2005年NIST语言识别评估(LRE05)任务中，使用这种转换可以将基于gmm的基线语言识别性能提高38%。此外，cmlr的增益与其他建模增强(如声道长度归一化(VTLN))是相加的。使用判别训练得到了进一步的改进，并且表明仅使用cmlr自适应的系统比使用VTLN的系统产生了最先进的精度，并且减少了测试时间计算成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improved GMM-based language recognition using constrained MLLR transforms

In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量

期刊最新文献

Rate-optimal MIMO transmission with mean and covariance feedback at low SNR Complexity adaptive H.264 encoding using multiple reference frames A low complexity selective mapping to reduce intercarrier interference in OFDM systems Learning to satisfy A message passing algorithm for active contours