{"title":"基于说话人相关特征电话估计的说话人自适应","authors":"Wenlin Zhang, Weiqiang Zhang, Bi-cheng Li","doi":"10.1109/ASRU.2011.6163904","DOIUrl":null,"url":null,"abstract":"Based on speaker dependent eigenphone estimation, a novel speaker adaptation technique is proposed in this paper. Different from conventional speaker adaptation approaches, the proposed method explicitly models the phone variations for each speaker through subspace modeling in the phone space. The phone coordinate, which is shared by all speakers, contains correlation information between different phones. During speaker adaptation, two schemes for estimation of the new speaker specific phone variation bases (namely eigenphones) are derived under maximum likelihood (ML) criterion and maximum a posteriori (MAP) criterion respectively. Supervised speaker adaptation experiments on a Mandarin Chinese continuous speech recognition task show that the new method outperforms both eigenvoice and maximum likelihood linear regression (MLLR) methods when sufficient adaptation data is available.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Speaker adaptation based on speaker-dependent eigenphone estimation\",\"authors\":\"Wenlin Zhang, Weiqiang Zhang, Bi-cheng Li\",\"doi\":\"10.1109/ASRU.2011.6163904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Based on speaker dependent eigenphone estimation, a novel speaker adaptation technique is proposed in this paper. Different from conventional speaker adaptation approaches, the proposed method explicitly models the phone variations for each speaker through subspace modeling in the phone space. The phone coordinate, which is shared by all speakers, contains correlation information between different phones. During speaker adaptation, two schemes for estimation of the new speaker specific phone variation bases (namely eigenphones) are derived under maximum likelihood (ML) criterion and maximum a posteriori (MAP) criterion respectively. Supervised speaker adaptation experiments on a Mandarin Chinese continuous speech recognition task show that the new method outperforms both eigenvoice and maximum likelihood linear regression (MLLR) methods when sufficient adaptation data is available.\",\"PeriodicalId\":338241,\"journal\":{\"name\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2011.6163904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speaker adaptation based on speaker-dependent eigenphone estimation
Based on speaker dependent eigenphone estimation, a novel speaker adaptation technique is proposed in this paper. Different from conventional speaker adaptation approaches, the proposed method explicitly models the phone variations for each speaker through subspace modeling in the phone space. The phone coordinate, which is shared by all speakers, contains correlation information between different phones. During speaker adaptation, two schemes for estimation of the new speaker specific phone variation bases (namely eigenphones) are derived under maximum likelihood (ML) criterion and maximum a posteriori (MAP) criterion respectively. Supervised speaker adaptation experiments on a Mandarin Chinese continuous speech recognition task show that the new method outperforms both eigenvoice and maximum likelihood linear regression (MLLR) methods when sufficient adaptation data is available.