Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages

Arun Baby, S. KarthikPandiaD., H. Murthy
{"title":"Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages","authors":"Arun Baby, S. KarthikPandiaD., H. Murthy","doi":"10.21437/SLTU.2018-6","DOIUrl":null,"url":null,"abstract":"Building accurate acoustic models for low resource languages is the focus of this paper. Acoustic models are likely to be accurate provided the phone boundaries are determined accurately. Conventional flat-start based Viterbi phone alignment (where only utterance level transcriptions are available) results in poor phone boundaries as the boundaries are not explicitly modeled in any statistical machine learning system. The focus of the effort in this paper is to explicitly model phrase boundaries using acoustic cues obtained using signal processing. A phrase is made up of a sequence of words, where each word is made up of a sequence of syllables. Syllable boundaries are detected using signal processing. The waveform corresponding to an utterance is spliced at phrase boundaries when it matches a syllable boundary. Gaussian mixture model - hidden Markov model (GMM-HMM) training is performed phrase by phrase, rather than utterance by utterance. Training using these short phrases yields better acoustic models. This alignment is then fed to a DNN to enable better discrimination between phones. During the training process, the syllable boundaries (obtained using signal processing) are restored in every iteration. A rela-tive improvement is observed in WER over the baseline Indian languages, namely, Gujarati, Tamil, and Telugu.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Spoken Language Technologies for Under-resourced Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SLTU.2018-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Building accurate acoustic models for low resource languages is the focus of this paper. Acoustic models are likely to be accurate provided the phone boundaries are determined accurately. Conventional flat-start based Viterbi phone alignment (where only utterance level transcriptions are available) results in poor phone boundaries as the boundaries are not explicitly modeled in any statistical machine learning system. The focus of the effort in this paper is to explicitly model phrase boundaries using acoustic cues obtained using signal processing. A phrase is made up of a sequence of words, where each word is made up of a sequence of syllables. Syllable boundaries are detected using signal processing. The waveform corresponding to an utterance is spliced at phrase boundaries when it matches a syllable boundary. Gaussian mixture model - hidden Markov model (GMM-HMM) training is performed phrase by phrase, rather than utterance by utterance. Training using these short phrases yields better acoustic models. This alignment is then fed to a DNN to enable better discrimination between phones. During the training process, the syllable boundaries (obtained using signal processing) are restored in every iteration. A rela-tive improvement is observed in WER over the baseline Indian languages, namely, Gujarati, Tamil, and Telugu.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
信号处理线索改善低资源印度语言的自动语音识别
为低资源语言建立准确的声学模型是本文研究的重点。如果电话边界被准确地确定,声学模型可能是准确的。传统的基于平面启动的Viterbi电话对齐(其中只有话语级别的转录可用)导致电话边界很差,因为边界没有在任何统计机器学习系统中明确建模。本文的重点是利用信号处理获得的声学线索来明确地建模短语边界。短语由一系列单词组成,其中每个单词又由一系列音节组成。使用信号处理检测音节边界。当与音节边界相匹配时,对应于话语的波形在短语边界处拼接。高斯混合模型-隐马尔可夫模型(GMM-HMM)训练是逐句进行,而不是逐句进行。使用这些短语进行训练可以产生更好的声学模型。然后将这种对齐馈送到DNN,以便更好地区分手机。在训练过程中,在每次迭代中恢复音节边界(通过信号处理获得)。相对于印度语言,即古吉拉特语、泰米尔语和泰卢固语,在WER中观察到相对改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Corpus of the Sorani Kurdish Folkloric Lyrics A Sentiment Analysis Dataset for Code-Mixed Malayalam-English Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala and Sundanese Text-to-Speech Systems Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1