{"title":"有效整合多个发音在一个大的词汇解码器","authors":"H. Schramm, X. Aubert","doi":"10.1109/ICASSP.2000.862068","DOIUrl":null,"url":null,"abstract":"The paper describes the improved handling of multiple pronunciations achieved in the Philips research decoder by (1) incorporating some prior information about their distributions and (2) combining the acoustic contributions of concurrent alternate word hypotheses. Starting from a baseline system where multiple pronunciations are treated as word copies without priors, an extension of the usual Viterbi decoding is presented which integrates unigram priors in a weighted sum of acoustic probabilities. Several approximations are discussed leading to new decoding aspects. Experimental results are presented for US broadcast news recordings. It is shown that the use of unigram priors has a clear positive impact on both error rate and decoding cost while the sum over multiple pronunciation contributions brings another small improvement. An overall 4% reduction of the error rate is achieved on the HUB-4 evaluation sets of 97 and 98.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Efficient integration of multiple pronunciations in a large vocabulary decoder\",\"authors\":\"H. Schramm, X. Aubert\",\"doi\":\"10.1109/ICASSP.2000.862068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper describes the improved handling of multiple pronunciations achieved in the Philips research decoder by (1) incorporating some prior information about their distributions and (2) combining the acoustic contributions of concurrent alternate word hypotheses. Starting from a baseline system where multiple pronunciations are treated as word copies without priors, an extension of the usual Viterbi decoding is presented which integrates unigram priors in a weighted sum of acoustic probabilities. Several approximations are discussed leading to new decoding aspects. Experimental results are presented for US broadcast news recordings. It is shown that the use of unigram priors has a clear positive impact on both error rate and decoding cost while the sum over multiple pronunciation contributions brings another small improvement. An overall 4% reduction of the error rate is achieved on the HUB-4 evaluation sets of 97 and 98.\",\"PeriodicalId\":164817,\"journal\":{\"name\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2000.862068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2000.862068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient integration of multiple pronunciations in a large vocabulary decoder
The paper describes the improved handling of multiple pronunciations achieved in the Philips research decoder by (1) incorporating some prior information about their distributions and (2) combining the acoustic contributions of concurrent alternate word hypotheses. Starting from a baseline system where multiple pronunciations are treated as word copies without priors, an extension of the usual Viterbi decoding is presented which integrates unigram priors in a weighted sum of acoustic probabilities. Several approximations are discussed leading to new decoding aspects. Experimental results are presented for US broadcast news recordings. It is shown that the use of unigram priors has a clear positive impact on both error rate and decoding cost while the sum over multiple pronunciation contributions brings another small improvement. An overall 4% reduction of the error rate is achieved on the HUB-4 evaluation sets of 97 and 98.