上下文相关语音马尔可夫模型用于大词汇量语音识别

Anne-Marie Derouault
{"title":"上下文相关语音马尔可夫模型用于大词汇量语音识别","authors":"Anne-Marie Derouault","doi":"10.1109/ICASSP.1987.1169604","DOIUrl":null,"url":null,"abstract":"One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Context-dependent phonetic Markov models for large vocabulary speech recognition\",\"authors\":\"Anne-Marie Derouault\",\"doi\":\"10.1109/ICASSP.1987.1169604\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.\",\"PeriodicalId\":140810,\"journal\":{\"name\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1987-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1987.1169604\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1987.1169604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

摘要

大词汇量语音识别的一种方法是建立语音马尔可夫模型,并将它们连接起来获得单词模型。在之前的工作中,我们已经设计了一个基于40个语音马尔可夫机的识别器,该识别器接受10,000个单词的词汇量([3]),最近接受200,000个单词的词汇量([5])。由于每个音素有一台机器,这些模型显然没有考虑到协同发音效应,这可能导致识别错误。本文利用协同发音在自动音素识别中的一般原理,对语音模型进行了改进。我们表明,无论是对识别器所犯错误的分析,还是对语音语境影响的语言学事实,都提出了一种选择语境依赖模型的方法。这种方法允许限制音素数量的增长,并且仍然考虑到最重要的协同发音效果。我们展示了我们的实验系统,将这些原则应用于法语的一组模型。在包含上下文相关机器的新系统中,音素识别率从82.2%提高到85.3%,在1万字字典中,单词的错误率从11.2%下降到9.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Context-dependent phonetic Markov models for large vocabulary speech recognition
One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A high resolution data-adaptive time-frequency representation A fast prediction-error detector for estimating sparse-spike sequences Some applications of mathematical morphology to range imagery Parameter estimation using the autocorrelation of the discrete Fourier transform Array signal processing with interconnected Neuron-like elements
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1