Bayesian learning of speech duration models

Jen-Tzung Chien, Chih-Hsien Huang
{"title":"Bayesian learning of speech duration models","authors":"Jen-Tzung Chien, Chih-Hsien Huang","doi":"10.1109/TSA.2003.818114","DOIUrl":null,"url":null,"abstract":"This paper presents the Bayesian speech duration modeling and learning for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson, and gamma distributions are investigated to characterize the duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced with twofold advantages. One is to determine the optimal QB duration parameter, which can be merged in HMMs for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill QB parameter estimation. The adaptation of overall HMM parameters can be performed simultaneously. In the experiments, the proposed adaptive duration model improves the speech recognition performance of Mandarin broadcast news and noisy connected digits. The batch and sequential learning are respectively investigated for MAP and QB duration models.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"75 1","pages":"558-567"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.818114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

This paper presents the Bayesian speech duration modeling and learning for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson, and gamma distributions are investigated to characterize the duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced with twofold advantages. One is to determine the optimal QB duration parameter, which can be merged in HMMs for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill QB parameter estimation. The adaptation of overall HMM parameters can be performed simultaneously. In the experiments, the proposed adaptive duration model improves the speech recognition performance of Mandarin broadcast news and noisy connected digits. The batch and sequential learning are respectively investigated for MAP and QB duration models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语音持续时间模型的贝叶斯学习
提出了基于隐马尔可夫模型(HMM)的语音识别的贝叶斯语音持续时间建模和学习方法。研究了基于准贝叶斯估计的HMM状态持续时间序列学习。该模型对非平稳语速和噪声条件具有较强的鲁棒性。在本研究中,研究了高斯分布、泊松分布和伽马分布来表征持续时间模型。提出了伽玛持续时间模型的最大后验估计方法。为了利用序列学习,我们采用了结合gamma先验密度的泊松持续时间模型,该模型属于共轭先验族。当连续观测自适应数据时,产生的伽玛后验密度具有双重优势。一是确定最佳QB持续时间参数,并将其合并到hmm中进行语音识别。二是建立序列学习的先验统计量更新机制。采用EM算法实现QB参数估计。HMM整体参数的自适应可以同时进行。在实验中,提出的自适应时长模型提高了普通话广播新闻和噪声连接数字的语音识别性能。分别研究了MAP和QB持续时间模型的批学习和顺序学习。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Errata to "Using Steady-State Suppression to Improve Speech Intelligibility in Reverberant Environments for Elderly Listeners" Farewell Editorial Inaugural Editorial: Riding the Tidal Wave of Human-Centric Information Processing - Innovate, Outreach, Collaborate, Connect, Expand, and Win Three-Dimensional Sound Field Reproduction Using Multiple Circular Loudspeaker Arrays Introduction to the Special Issue on Processing Reverberant Speech: Methodologies and Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1