Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-12-01 DOI:10.1109/ASRU.2017.8268944

K. Sim, A. Narayanan, Tom Bagby, Tara N. Sainath, M. Bacchiani

{"title":"Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow","authors":"K. Sim, A. Narayanan, Tom Bagby, Tara N. Sainath, M. Bacchiani","doi":"10.1109/ASRU.2017.8268944","DOIUrl":null,"url":null,"abstract":"Sequence-level losses are commonly used to train deep neural network acoustic models for automatic speech recognition. The forward-backward algorithm is used to efficiently compute the gradients of the sequence loss with respect to the model parameters. Gradient-based optimization is used to minimize these losses. Recent work has shown that the forward-backward algorithm can be efficiently implemented as a series of matrix operations. This paper further improves the forward-backward algorithm via batched computation, a technique commonly used to improve training speed by exploiting the parallel computation of matrix multiplication. Specifically, we show how batched computation of the forward-backward algorithm can be efficiently implemented using TensorFlow to handle variable-length sequences within a mini batch. Furthermore, we also show how the batched forward-backward computation can be used to compute the gradients of the connectionist temporal classification (CTC) and maximum mutual information (MMI) losses with respect to the logits. We show, via empirical benchmarks, that the batched forward-backward computation can speed up the CTC loss and gradient computation by about 183 times when run on GPU with a batch size of 256 compared to using a batch size of 1; and by about 22 times for lattice-free MMI using a trigram phone language model for the denominator.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Sequence-level losses are commonly used to train deep neural network acoustic models for automatic speech recognition. The forward-backward algorithm is used to efficiently compute the gradients of the sequence loss with respect to the model parameters. Gradient-based optimization is used to minimize these losses. Recent work has shown that the forward-backward algorithm can be efficiently implemented as a series of matrix operations. This paper further improves the forward-backward algorithm via batched computation, a technique commonly used to improve training speed by exploiting the parallel computation of matrix multiplication. Specifically, we show how batched computation of the forward-backward algorithm can be efficiently implemented using TensorFlow to handle variable-length sequences within a mini batch. Furthermore, we also show how the batched forward-backward computation can be used to compute the gradients of the connectionist temporal classification (CTC) and maximum mutual information (MMI) losses with respect to the logits. We show, via empirical benchmarks, that the batched forward-backward computation can speed up the CTC loss and gradient computation by about 183 times when run on GPU with a batch size of 256 compared to using a batch size of 1; and by about 22 times for lattice-free MMI using a trigram phone language model for the denominator.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在TensorFlow中使用批处理计算提高前向向后算法的效率

序列级损失通常用于训练用于自动语音识别的深度神经网络声学模型。采用正倒向算法有效地计算了序列损失相对于模型参数的梯度。基于梯度的优化用于最小化这些损失。最近的研究表明，向前-向后算法可以有效地实现为一系列矩阵运算。本文通过批处理计算进一步改进了前向向后算法，批处理计算是一种常用的利用矩阵乘法并行计算来提高训练速度的技术。具体来说，我们展示了如何使用TensorFlow有效地实现前向向后算法的批量计算，以处理迷你批处理中的可变长度序列。此外，我们还展示了如何使用批量前向后计算来计算连接时间分类(CTC)的梯度和相对于logits的最大互信息(MMI)损失。我们通过经验基准测试表明，在批处理大小为256的GPU上运行时，与使用批处理大小为1的GPU相比，批处理前后向计算可以将CTC损失和梯度计算速度提高约183倍;而对于使用三格电话语言模型作为分母的无格MMI，则增加了约22倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量

期刊最新文献

Scalable multi-domain dialogue state tracking Topic segmentation in ASR transcripts using bidirectional RNNS for change detection Consistent DNN uncertainty training and decoding for robust ASR Cracking the cocktail party problem by multi-beam deep attractor network ONENET: Joint domain, intent, slot prediction for spoken language understanding