An exploration of large vocabulary tools for small vocabulary phonetic recognition

Tara N. Sainath, B. Ramabhadran, M. Picheny
{"title":"An exploration of large vocabulary tools for small vocabulary phonetic recognition","authors":"Tara N. Sainath, B. Ramabhadran, M. Picheny","doi":"10.1109/ASRU.2009.5373263","DOIUrl":null,"url":null,"abstract":"While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers from two main drawbacks. First, because of the large number of parameters and poorly labeled transcriptions, gaining insight into further improvements based on error analysis is very difficult. Second, LVCSR systems often take a significantly longer time to train and test new research ideas compared to small vocabulary tasks. A small vocabulary task like TIMIT provides a phonetically rich and hand-labeled corpus and offers a good test bed to study algorithmic improvements. However, oftentimes research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we address these issues by taking the standard Ȝrecipeȝ used in typical LVCSR systems and applying it to the TIMIT phonetic recognition corpus, which provides a standard benchmark to compare methods. We find that at the speaker-independent (SI) level, our results offer comparable performance to other SI HMM systems. By taking advantage of speaker adaptation and discriminative training techniques commonly used in LVCSR systems, we achieve an error rate of 20%, the best results reported on the TIMIT task to date, moving us closer to the human reported phonetic recognition error rate of 15%. We propose the use of this system as the baseline for future research and believe that it will serve as a good framework to explore ideas that will carry over to LVCSR systems.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

Abstract

While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers from two main drawbacks. First, because of the large number of parameters and poorly labeled transcriptions, gaining insight into further improvements based on error analysis is very difficult. Second, LVCSR systems often take a significantly longer time to train and test new research ideas compared to small vocabulary tasks. A small vocabulary task like TIMIT provides a phonetically rich and hand-labeled corpus and offers a good test bed to study algorithmic improvements. However, oftentimes research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we address these issues by taking the standard Ȝrecipeȝ used in typical LVCSR systems and applying it to the TIMIT phonetic recognition corpus, which provides a standard benchmark to compare methods. We find that at the speaker-independent (SI) level, our results offer comparable performance to other SI HMM systems. By taking advantage of speaker adaptation and discriminative training techniques commonly used in LVCSR systems, we achieve an error rate of 20%, the best results reported on the TIMIT task to date, moving us closer to the human reported phonetic recognition error rate of 15%. We propose the use of this system as the baseline for future research and believe that it will serve as a good framework to explore ideas that will carry over to LVCSR systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
小词汇语音识别的大词汇工具探索
虽然大词汇量连续语音识别(LVCSR)的研究引发了许多最新研究思路的发展,但该领域的研究存在两个主要缺陷。首先,由于大量的参数和标记不佳的转录,基于错误分析获得进一步改进的洞察力是非常困难的。其次,与小词汇任务相比,LVCSR系统通常需要更长的时间来训练和测试新的研究想法。像TIMIT这样的小型词汇任务提供了一个语音丰富且手工标记的语料库,并提供了一个很好的测试平台来研究算法改进。然而,通常为小词汇任务探索的研究思路并不总是为LVCSR系统提供收益。在本文中,我们通过将典型LVCSR系统中使用的标准Ȝrecipeȝ应用于TIMIT语音识别语料库来解决这些问题,该语料库为比较方法提供了一个标准基准。我们发现,在扬声器无关(SI)水平上,我们的结果提供了与其他SI HMM系统相当的性能。通过利用LVCSR系统中常用的说话人自适应和判别训练技术,我们实现了20%的错误率,这是迄今为止在TIMIT任务中报告的最佳结果,使我们更接近人类报告的语音识别错误率15%。我们建议使用该系统作为未来研究的基线,并相信它将作为一个很好的框架来探索将延续到LVCSR系统的想法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Detection of OOV words by combining acoustic confidence measures with linguistic features Automatic translation from parallel speech: Simultaneous interpretation as MT training data Local and global models for spontaneous speech segment detection and characterization Automatic punctuation generation for speech Response timing generation and response type selection for a spontaneous spoken dialog system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1