上下文相关语音马尔可夫模型用于大词汇量语音识别

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing Pub Date : 1987-04-06 DOI:10.1109/ICASSP.1987.1169604

Anne-Marie Derouault

{"title":"上下文相关语音马尔可夫模型用于大词汇量语音识别","authors":"Anne-Marie Derouault","doi":"10.1109/ICASSP.1987.1169604","DOIUrl":null,"url":null,"abstract":"One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Context-dependent phonetic Markov models for large vocabulary speech recognition\",\"authors\":\"Anne-Marie Derouault\",\"doi\":\"10.1109/ICASSP.1987.1169604\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.\",\"PeriodicalId\":140810,\"journal\":{\"name\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1987-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1987.1169604\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1987.1169604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

大词汇量语音识别的一种方法是建立语音马尔可夫模型，并将它们连接起来获得单词模型。在之前的工作中，我们已经设计了一个基于40个语音马尔可夫机的识别器，该识别器接受10,000个单词的词汇量([3])，最近接受200,000个单词的词汇量([5])。由于每个音素有一台机器，这些模型显然没有考虑到协同发音效应，这可能导致识别错误。本文利用协同发音在自动音素识别中的一般原理，对语音模型进行了改进。我们表明，无论是对识别器所犯错误的分析，还是对语音语境影响的语言学事实，都提出了一种选择语境依赖模型的方法。这种方法允许限制音素数量的增长，并且仍然考虑到最重要的协同发音效果。我们展示了我们的实验系统，将这些原则应用于法语的一组模型。在包含上下文相关机器的新系统中，音素识别率从82.2%提高到85.3%，在1万字字典中，单词的错误率从11.2%下降到9.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Context-dependent phonetic Markov models for large vocabulary speech recognition

One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

自引率

0.00%

发文量

期刊最新文献

A high resolution data-adaptive time-frequency representation A fast prediction-error detector for estimating sparse-spike sequences Some applications of mathematical morphology to range imagery Parameter estimation using the autocorrelation of the discrete Fourier transform Array signal processing with interconnected Neuron-like elements