Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda
{"title":"A model structure integration based on a Bayesian framework for speech recognition","authors":"Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda","doi":"10.1109/ICASSP.2012.6288996","DOIUrl":null,"url":null,"abstract":"This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2012.6288996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.