基于语音变化的MLLR/MAP自适应非母语语音识别

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI:10.1109/ASRU.2009.5373299

Y. Oh, H. Kim

{"title":"基于语音变化的MLLR/MAP自适应非母语语音识别","authors":"Y. Oh, H. Kim","doi":"10.1109/ASRU.2009.5373299","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"MLLR/MAP adaptation using pronunciation variation for non-native speech recognition\",\"authors\":\"Y. Oh, H. Kim\",\"doi\":\"10.1109/ASRU.2009.5373299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.\",\"PeriodicalId\":292194,\"journal\":{\"name\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2009.5373299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

在本文中，我们提出了一种基于最大似然线性回归(MLLR)和最大后验自适应(MAP)的声学模型自适应方法，用于非母语语音识别。为此，我们首先使用间接数据驱动方法获得发音变化。接下来，我们生成两组回归类:一组由所有发音的回归类组成，另一组由发音变化的回归类组成。前者称为整体回归类，后者称为发音变异回归类。接下来，我们依次使用整体回归类将这两种适应应用于非母语语音，同时使用发音变化回归类对与发音变化相关的声学模型进行适应。最后一步，合并两组自适应声学模型。因此，所得到的声学模型可以涵盖非母语说话者的特征以及非母语语音的发音变化。对韩语英语口语连续语音的非母语自动语音识别实验表明，与传统的MLLR/MAP自适应方法相比，采用本文提出的自适应方法的ASR系统平均单词错误率相对降低了9.43%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MLLR/MAP adaptation using pronunciation variation for non-native speech recognition

In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量