{"title":"基于语音变化的MLLR/MAP自适应非母语语音识别","authors":"Y. Oh, H. Kim","doi":"10.1109/ASRU.2009.5373299","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"MLLR/MAP adaptation using pronunciation variation for non-native speech recognition\",\"authors\":\"Y. Oh, H. Kim\",\"doi\":\"10.1109/ASRU.2009.5373299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.\",\"PeriodicalId\":292194,\"journal\":{\"name\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2009.5373299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MLLR/MAP adaptation using pronunciation variation for non-native speech recognition
In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.