Sequence training of DNN acoustic models with natural gradient

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-12-01 DOI:10.1109/ASRU.2017.8268933

Adnan Haider, P. Woodland

引用次数: 6

Abstract

Deep Neural Network (DNN) acoustic models often use discriminative sequence training that optimises an objective function that better approximates the word error rate (WER) than frame-based training. Sequence training is normally implemented using Stochastic Gradient Descent (SGD) or Hessian Free (HF) training. This paper proposes an alternative batch style optimisation framework that employs a Natural Gradient (NG) approach to traverse through the parameter space. By correcting the gradient according to the local curvature of the KL-divergence, the NG optimisation process converges more quickly than HF. Furthermore, the proposed NG approach can be applied to any sequence discriminative training criterion. The efficacy of the NG method is shown using experiments on a Multi-Genre Broadcast (MGB) transcription task that demonstrates both the computational efficiency and the accuracy of the resulting DNN models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自然梯度DNN声学模型的序列训练

深度神经网络(DNN)声学模型通常使用判别序列训练，该训练优化的目标函数比基于帧的训练更接近单词错误率(WER)。序列训练通常使用随机梯度下降(SGD)或Hessian Free (HF)训练来实现。本文提出了一种替代的批处理样式优化框架，该框架采用自然梯度(NG)方法遍历参数空间。通过根据kl -散度的局部曲率修正梯度，NG优化过程比HF更快收敛。此外，该方法可应用于任意序列判别训练准则。在多类型广播(MGB)转录任务上的实验证明了神经网络方法的有效性，证明了所得到的DNN模型的计算效率和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量

期刊最新文献

Scalable multi-domain dialogue state tracking Topic segmentation in ASR transcripts using bidirectional RNNS for change detection Consistent DNN uncertainty training and decoding for robust ASR Cracking the cocktail party problem by multi-beam deep attractor network ONENET: Joint domain, intent, slot prediction for spoken language understanding