Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846280

T. A. Hanai, Wei-Ning Hsu, James R. Glass

引用次数: 15

Abstract

The Arabic language, with over 300 million speakers, has significant diversity and breadth. This proves challenging when building an automated system to understand what is said. This paper describes an Arabic Automatic Speech Recognition system developed on a 1,200 hour speech corpus that was made available for the 2016 Arabic Multi-genre Broadcast (MGB) Challenge. A range of Deep Neural Network (DNN) topologies were modeled including; Feed-forward, Convolutional, Time-Delay, Recurrent Long Short-Term Memory (LSTM), Highway LSTM (H-LSTM), and Grid LSTM (GLSTM). The best performance came from a sequence discriminatively trained G-LSTM neural network. The best overall Word Error Rate (WER) was 18.3% (p < 0:001) on the development set, after combining hypotheses of 3 and 5 layer sequence discriminatively trained G-LSTM models that had been rescored with a 4-gram language model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为2016年阿拉伯多类型广播挑战赛开发麻省理工学院ASR系统

阿拉伯语有3亿多使用者，具有显著的多样性和广泛性。当构建一个自动化系统来理解所说的内容时，这被证明是具有挑战性的。本文描述了一个基于1200小时语音语料库开发的阿拉伯语自动语音识别系统，该语料库可用于2016年阿拉伯语多类型广播(MGB)挑战赛。一系列深度神经网络(DNN)拓扑被建模，包括;前馈、卷积、时滞、循环长短期记忆(LSTM)、高速LSTM (H-LSTM)、网格LSTM (GLSTM)。序列判别训练的G-LSTM神经网络性能最好。将3层和5层序列判别训练的G-LSTM模型与4克语言模型相结合，在开发集上，最佳的总体单词错误率(WER)为18.3% (p < 0:001)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量