On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition

L. Rabiner, S. Levinson, M. Sondhi
{"title":"On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition","authors":"L. Rabiner, S. Levinson, M. Sondhi","doi":"10.1002/J.1538-7305.1983.TB03115.X","DOIUrl":null,"url":null,"abstract":"In this paper we present an approach to speaker-independent, isolated word recognition in which the well-known techniques of vector quantization and hidden Markov modeling are combined with a linear predictive coding analysis front end. This is done in the framework of a standard statistical pattern recognition model. Both the vector quantizer and the hidden Markov models need to be trained for the vocabulary being recognized. Such training results in a distinct hidden Markov model for each word of the vocabulary. Classification consists of computing the probability of generating the test word with each word model and choosing the word model that gives the highest probability. There are several factors, in both the vector quantizer and the hidden Markov modeling, that affect the performance of the overall word recognition system, including the size of the vector quantizer, the structure of the hidden Markov model, the ways of handling insufficient training data, etc. The effects, on recognition accuracy, of many of these factors are discussed in this paper. The entire recognizer (training and testing) has been evaluated on a 10-word digits vocabulary. For training, a set of 100 talkers spoke each of the digits one time. For testing, an independent set of 100 tokens of each of the digits was obtained. The overall recognition accuracy was found to be 96.5 percent for the 100-talker test set. These results are comparable to those obtained in earlier work, using a dynamic time-warping recognition algorithm with multiple templates per digit. It is also shown that the computation and storage requirements of the new recognizer were an order of magnitude less than that required for a conventional pattern recognition system using linear prediction with dynamic time warping.","PeriodicalId":447574,"journal":{"name":"The Bell System Technical Journal","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1983-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"355","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Bell System Technical Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/J.1538-7305.1983.TB03115.X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 355

Abstract

In this paper we present an approach to speaker-independent, isolated word recognition in which the well-known techniques of vector quantization and hidden Markov modeling are combined with a linear predictive coding analysis front end. This is done in the framework of a standard statistical pattern recognition model. Both the vector quantizer and the hidden Markov models need to be trained for the vocabulary being recognized. Such training results in a distinct hidden Markov model for each word of the vocabulary. Classification consists of computing the probability of generating the test word with each word model and choosing the word model that gives the highest probability. There are several factors, in both the vector quantizer and the hidden Markov modeling, that affect the performance of the overall word recognition system, including the size of the vector quantizer, the structure of the hidden Markov model, the ways of handling insufficient training data, etc. The effects, on recognition accuracy, of many of these factors are discussed in this paper. The entire recognizer (training and testing) has been evaluated on a 10-word digits vocabulary. For training, a set of 100 talkers spoke each of the digits one time. For testing, an independent set of 100 tokens of each of the digits was obtained. The overall recognition accuracy was found to be 96.5 percent for the 100-talker test set. These results are comparable to those obtained in earlier work, using a dynamic time-warping recognition algorithm with multiple templates per digit. It is also shown that the computation and storage requirements of the new recognizer were an order of magnitude less than that required for a conventional pattern recognition system using linear prediction with dynamic time warping.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
向量量化和隐马尔可夫模型在独立于说话人的孤立词识别中的应用
在本文中,我们提出了一种独立于说话人的孤立词识别方法,其中将众所周知的矢量量化和隐马尔可夫建模技术与线性预测编码分析前端相结合。这是在标准统计模式识别模型的框架内完成的。为了识别词汇表,需要训练向量量化器和隐马尔可夫模型。这样的训练会为词汇表中的每个单词生成一个不同的隐马尔可夫模型。分类包括计算每个词模型生成测试词的概率,并选择给出最高概率的词模型。在矢量量化器和隐马尔可夫建模中,有几个因素会影响整个单词识别系统的性能,包括矢量量化器的大小、隐马尔可夫模型的结构、处理不足训练数据的方式等。本文讨论了这些因素对识别精度的影响。整个识别器(训练和测试)已经在10个单词的数字词汇上进行了评估。在训练中,一组100名说话者分别说出每个数字一次。为了进行测试,获得了每个数字的100个令牌的独立集合。在100个说话者的测试集中,总体识别准确率为96.5%。这些结果与早期工作中获得的结果相当,使用动态时间扭曲识别算法,每个数字有多个模板。该方法的计算量和存储容量比传统的线性预测模式识别系统的计算量和存储容量都要低一个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Acronyms and abbreviations Acronyms and abbreviations Time-Compression Multiplexing (TCM) of three broadcast-quality TV signals on a satellite transponder Theory of reflection from antireflection coatings Equivalent queueing networks and their use in approximate equilibrium analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1