M. Hwang, X. Lei, Tim Ng, I. Bulyko, Mari Ostendorf, A. Stolcke, Wen Wang, Jing Zheng, V. R. Gadde, M. Graciarena, M. Siu, Yan Huang
{"title":"Progress on Mandarin conversational telephone speech recognition","authors":"M. Hwang, X. Lei, Tim Ng, I. Bulyko, Mari Ostendorf, A. Stolcke, Wen Wang, Jing Zheng, V. R. Gadde, M. Graciarena, M. Siu, Yan Huang","doi":"10.1109/CHINSL.2004.1409571","DOIUrl":null,"url":null,"abstract":"Over the past decade, there has been good progress on English conversational telephone speech (CTS) recognition, built on the Switchboard and Fisher corpora. In this paper, we present our efforts on extending language-independent technologies into Mandarin CTS, as well as addressing language-dependent issues such as tone. We show the impact of each of the following factors: (a) simplified Mandarin phone set; (b) pitch features; (c) auto-retrieved Web texts for augmenting n-gram training; (d) speaker adaptive training; (e) maximum mutual information estimation; (f) decision-tree-based parameter sharing; (g) cross-word co-articulation modeling; and (h) combining MFCC and PLP decoding outputs using confusion networks. We have reduced the Chinese character error rate (CER) of the BBN-2003 development test set from 53.8% to 46.8% after (a)+(b)+(c)+(f)+(g) are combined. Further reduction in CER is anticipated after integrating all improvements.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Over the past decade, there has been good progress on English conversational telephone speech (CTS) recognition, built on the Switchboard and Fisher corpora. In this paper, we present our efforts on extending language-independent technologies into Mandarin CTS, as well as addressing language-dependent issues such as tone. We show the impact of each of the following factors: (a) simplified Mandarin phone set; (b) pitch features; (c) auto-retrieved Web texts for augmenting n-gram training; (d) speaker adaptive training; (e) maximum mutual information estimation; (f) decision-tree-based parameter sharing; (g) cross-word co-articulation modeling; and (h) combining MFCC and PLP decoding outputs using confusion networks. We have reduced the Chinese character error rate (CER) of the BBN-2003 development test set from 53.8% to 46.8% after (a)+(b)+(c)+(f)+(g) are combined. Further reduction in CER is anticipated after integrating all improvements.