A prioritized grid long short-term memory RNN for speech recognition

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846305

Wei-Ning Hsu, Yu Zhang, James R. Glass

引用次数: 31

Abstract

Recurrent neural networks (RNNs) are naturally suitable for speech recognition because of their ability of utilizing dynamically changing temporal information. Deep RNNs have been argued to be able to model temporal relationships at different time granularities, but suffer vanishing gradient problems. In this paper, we extend stacked long short-term memory (LSTM) RNNs by using grid LSTM blocks that formulate computation along not only the temporal dimension, but also the depth dimension, in order to alleviate this issue. Moreover, we prioritize the depth dimension over the temporal one to provide the depth dimension more updated information, since the output from it will be used for classification. We call this model the prioritized Grid LSTM (pGLSTM). Extensive experiments on four large datasets (AMI, HKUST, GALE, and MGB) indicate that the pGLSTM outperforms alternative deep LSTM models, beating stacked LSTMs with 4% to 7% relative improvement, and achieve new benchmarks among uni-directional models on all datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

语音识别的优先网格长短期记忆RNN

递归神经网络(RNNs)由于其利用动态变化的时间信息的能力，自然适用于语音识别。深度rnn被认为能够模拟不同时间粒度的时间关系，但存在梯度消失问题。在本文中，我们通过使用网格LSTM块来扩展堆叠长短期记忆(LSTM) rnn，该网格LSTM块不仅沿着时间维度，而且沿着深度维度进行计算，以缓解这一问题。此外，我们将深度维度优先于时间维度，以便为深度维度提供更多的更新信息，因为它的输出将用于分类。我们称这个模型为优先网格LSTM (pGLSTM)。在四个大型数据集(AMI, HKUST, GALE和MGB)上进行的大量实验表明，pGLSTM优于其他深度LSTM模型，比堆叠LSTM的相对改进幅度为4%至7%，并且在所有数据集上的单向模型中实现了新的基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量