Multi-stream temporally varying weight regression for cross-lingual speech recognition

Shilin Liu, K. Sim
{"title":"Multi-stream temporally varying weight regression for cross-lingual speech recognition","authors":"Shilin Liu, K. Sim","doi":"10.1109/ASRU.2013.6707769","DOIUrl":null,"url":null,"abstract":"Building a good Automatic Speech Recognition (ASR) system with limited resources is a very challenging task due to the existing many speech variations. Multilingual and cross-lingual speech recognition techniques are commonly used for this task. This paper investigates the recently proposed Temporally Varying Weight Regression (TVWR) method for cross-lingual speech recognition. TVWR uses posterior features to implicitly model the long-term temporal structures in acoustic patterns. By leveraging on the well-trained foreign recognizers, high quality monophone/state posteriors can be easily incorporated into TVWR to boost the ASR performance on low-resource languages. Furthermore, multi-stream TVWR is proposed, where multiple sets of posterior features are used to incorporate richer (temporal and spatial) context information. Finally, a separate state-tying for the TVWR regression parameters is used to better utilize the more reliable posterior features. Experimental results are evaluated for English and Malay speech recognition with limited resources. By using the Czech, Hungarian and Russian posterior features, TVWR was found to consistently outperform the tandem systems trained on the same features.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2013.6707769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Building a good Automatic Speech Recognition (ASR) system with limited resources is a very challenging task due to the existing many speech variations. Multilingual and cross-lingual speech recognition techniques are commonly used for this task. This paper investigates the recently proposed Temporally Varying Weight Regression (TVWR) method for cross-lingual speech recognition. TVWR uses posterior features to implicitly model the long-term temporal structures in acoustic patterns. By leveraging on the well-trained foreign recognizers, high quality monophone/state posteriors can be easily incorporated into TVWR to boost the ASR performance on low-resource languages. Furthermore, multi-stream TVWR is proposed, where multiple sets of posterior features are used to incorporate richer (temporal and spatial) context information. Finally, a separate state-tying for the TVWR regression parameters is used to better utilize the more reliable posterior features. Experimental results are evaluated for English and Malay speech recognition with limited resources. By using the Czech, Hungarian and Russian posterior features, TVWR was found to consistently outperform the tandem systems trained on the same features.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
跨语言语音识别的多流时变权回归
在有限的资源下,构建一个好的自动语音识别系统是一项非常具有挑战性的任务,因为语音变体很多。多语言和跨语言语音识别技术通常用于此任务。本文研究了最近提出的用于跨语言语音识别的时间变权回归(TVWR)方法。TVWR使用后验特征来隐式模拟声学模式中的长期时间结构。通过利用训练有素的外国识别器,高质量的单音/状态后置可以很容易地融入TVWR中,以提高低资源语言的ASR性能。在此基础上,提出了多流TVWR算法,利用多组后验特征融合更丰富的(时间和空间)上下文信息。最后,对TVWR回归参数使用单独的状态绑定,以更好地利用更可靠的后验特征。在有限的资源条件下,对马来语和英语语音识别的实验结果进行了评价。通过使用捷克、匈牙利和俄罗斯的后验特征,TVWR被发现始终优于在相同特征上训练的串联系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Learning filter banks within a deep neural network framework Efficient nearly error-less LVCSR decoding based on incremental forward and backward passes Porting concepts from DNNs back to GMMs Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition Acoustic modeling using transform-based phone-cluster adaptive training
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1