语音识别的联合瓶颈特征与注意模型

Long Xingyan, Qu Dan
{"title":"语音识别的联合瓶颈特征与注意模型","authors":"Long Xingyan, Qu Dan","doi":"10.1145/3208788.3208798","DOIUrl":null,"url":null,"abstract":"Recently, attention based sequence-to-sequence model become a research hotspot in speech recognition. The attention model has the problem of slow convergence and poor robustness. In this paper, a model that jointed a bottleneck feature extraction network and attention model is proposed. The model is composed of a Deep Belief Network as bottleneck feature extraction network and an attention-based encoder-decoder model. DBN can store the priori information from Hidden Markov Model so that increasing convergence speed of and enhancing both robustness and discrimination of features. Attention model utilizes the temporal information of feature sequence to calculate the posterior probability of phoneme. Then the number of stack recurrent neural network layers in attention model is reduced in order to decrease the calculation of gradient. Experiments in the TIMIT corpus showed that the phoneme error rate is 17.80% in test set, the average training iteration decreased 52%, and the number of training iterations decreased from 139 to 89. The word error rate of WSJ eval92 is 12.9% without any external language model.","PeriodicalId":211585,"journal":{"name":"Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence","volume":"44 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Joint bottleneck feature and attention model for speech recognition\",\"authors\":\"Long Xingyan, Qu Dan\",\"doi\":\"10.1145/3208788.3208798\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, attention based sequence-to-sequence model become a research hotspot in speech recognition. The attention model has the problem of slow convergence and poor robustness. In this paper, a model that jointed a bottleneck feature extraction network and attention model is proposed. The model is composed of a Deep Belief Network as bottleneck feature extraction network and an attention-based encoder-decoder model. DBN can store the priori information from Hidden Markov Model so that increasing convergence speed of and enhancing both robustness and discrimination of features. Attention model utilizes the temporal information of feature sequence to calculate the posterior probability of phoneme. Then the number of stack recurrent neural network layers in attention model is reduced in order to decrease the calculation of gradient. Experiments in the TIMIT corpus showed that the phoneme error rate is 17.80% in test set, the average training iteration decreased 52%, and the number of training iterations decreased from 139 to 89. The word error rate of WSJ eval92 is 12.9% without any external language model.\",\"PeriodicalId\":211585,\"journal\":{\"name\":\"Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence\",\"volume\":\"44 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3208788.3208798\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3208788.3208798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

近年来,基于注意力的序列到序列模型成为语音识别领域的研究热点。注意模型存在收敛速度慢、鲁棒性差的问题。本文提出了一个瓶颈特征提取网络与注意力模型相结合的模型。该模型由深度信念网络作为瓶颈特征提取网络和基于注意力的编码器-解码器模型组成。DBN存储了隐马尔可夫模型的先验信息,提高了算法的收敛速度,增强了特征的鲁棒性和识别能力。注意模型利用特征序列的时间信息来计算音素的后验概率。然后减少了注意模型中堆栈递归神经网络的层数,以减少梯度的计算。在TIMIT语料库上的实验表明,测试集的音素错误率为17.80%,平均训练迭代次数减少52%,训练迭代次数从139次减少到89次。在没有任何外部语言模型的情况下,WSJ eval92的单词错误率为12.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Joint bottleneck feature and attention model for speech recognition
Recently, attention based sequence-to-sequence model become a research hotspot in speech recognition. The attention model has the problem of slow convergence and poor robustness. In this paper, a model that jointed a bottleneck feature extraction network and attention model is proposed. The model is composed of a Deep Belief Network as bottleneck feature extraction network and an attention-based encoder-decoder model. DBN can store the priori information from Hidden Markov Model so that increasing convergence speed of and enhancing both robustness and discrimination of features. Attention model utilizes the temporal information of feature sequence to calculate the posterior probability of phoneme. Then the number of stack recurrent neural network layers in attention model is reduced in order to decrease the calculation of gradient. Experiments in the TIMIT corpus showed that the phoneme error rate is 17.80% in test set, the average training iteration decreased 52%, and the number of training iterations decreased from 139 to 89. The word error rate of WSJ eval92 is 12.9% without any external language model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Two-point boundary value problems for fuzzy differential equations under generalized differentiability Background subtraction via online box constrained RPCA Bayesian analysis for multivariate skew-normal reproductive dispersion random effects models A diversity-based method for class-imbalanced cost-sensitive learning The Merrifield-Simmons index of two classes of lexicographic product graphs of corona graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1