An unsupervised vocabulary selection technique for Chinese automatic speech recognition

Yike Zhang, Pengyuan Zhang, Ta Li, Yonghong Yan
{"title":"An unsupervised vocabulary selection technique for Chinese automatic speech recognition","authors":"Yike Zhang, Pengyuan Zhang, Ta Li, Yonghong Yan","doi":"10.1109/SLT.2016.7846298","DOIUrl":null,"url":null,"abstract":"The vocabulary is a vital component of automatic speech recognition(ASR) systems. For a specific Chinese speech recognition task, using a large general vocabulary not only leads to a much longer time to decode, but also hurts the recognition accuracy. In this paper, we proposed an unsupervised algorithm to select task-specific words from a large general vocabulary. The out-of-vocabulary(OOV) rate is a measure of vocabularies, and it is related to the recognition accuracy. However, it is hard to compute OOV rate for a Chinese vocabulary, since OOVs are often segmented into single Chinese characters and most Chinese vocabularies contain all the single Chinese characters. To deal with this problem, we proposed a novel method to estimate the OOV rate of Chinese vocabularies. In experiments, we found that our estimated OOV rate is related to the character error rate(CER) of recognition. Our proposed vocabulary selection method provided both the lowest OOV rate and CER on two Chinese conversational telephone speech(CTS) evaluation sets compared to the general vocabulary and frequency based vocabulary selection method. In addition, our proposed method significantly reduced the size of the language model(LM) and the corresponding weighted finite state transducer(WFST) network, which led to a more efficient decoding.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The vocabulary is a vital component of automatic speech recognition(ASR) systems. For a specific Chinese speech recognition task, using a large general vocabulary not only leads to a much longer time to decode, but also hurts the recognition accuracy. In this paper, we proposed an unsupervised algorithm to select task-specific words from a large general vocabulary. The out-of-vocabulary(OOV) rate is a measure of vocabularies, and it is related to the recognition accuracy. However, it is hard to compute OOV rate for a Chinese vocabulary, since OOVs are often segmented into single Chinese characters and most Chinese vocabularies contain all the single Chinese characters. To deal with this problem, we proposed a novel method to estimate the OOV rate of Chinese vocabularies. In experiments, we found that our estimated OOV rate is related to the character error rate(CER) of recognition. Our proposed vocabulary selection method provided both the lowest OOV rate and CER on two Chinese conversational telephone speech(CTS) evaluation sets compared to the general vocabulary and frequency based vocabulary selection method. In addition, our proposed method significantly reduced the size of the language model(LM) and the corresponding weighted finite state transducer(WFST) network, which led to a more efficient decoding.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
中文语音自动识别的无监督词汇选择技术
词汇表是自动语音识别(ASR)系统的重要组成部分。对于特定的汉语语音识别任务,使用大量的通用词汇不仅会导致解码时间更长,而且会损害识别的准确性。在本文中,我们提出了一种从大量通用词汇中选择任务特定词汇的无监督算法。词汇外率是词汇量的度量,它关系到识别的准确性。然而,汉语词汇表的面向对象率很难计算,因为面向对象经常被分割成单个汉字,而且大多数汉语词汇表包含所有单个汉字。为了解决这一问题,我们提出了一种估算汉语词汇OOV率的新方法。在实验中,我们发现我们估计的OOV率与识别的字符错误率(CER)有关。与普通词汇和基于频率的词汇选择方法相比,我们提出的词汇选择方法在两个汉语会话电话语音(CTS)评价集上的OOV率和CER均最低。此外,我们提出的方法显著减小了语言模型(LM)和相应的加权有限状态传感器(WFST)网络的大小,从而提高了解码效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification Learning dialogue dynamics with the method of moments A study of speech distortion conditions in real scenarios for speech processing applications Comparing speaker independent and speaker adapted classification for word prominence detection Influence of corpus size and content on the perceptual quality of a unit selection MaryTTS voice
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1