Efficient nearly error-less LVCSR decoding based on incremental forward and backward passes

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI:10.1109/ASRU.2013.6707707

David Nolden, R. Schlüter, H. Ney

引用次数: 2

Abstract

We show that most search errors can be identified by aligning the results of a symmetric forward and backward decoding pass. Based on this knowledge, we introduce an efficient high-level decoding architecture which yields virtually no search errors, and requires virtually no manual tuning. We perform an initial forward- and backward decoding with tight initial beams, then we identify search errors, and then we recursively increment the beam sizes and perform new forward and backward decodings for erroneous intervals until no more search errors are detected. Consequently, each utterance and even each single word is decoded with the smallest beam size required to decode it correctly. On all tested systems we achieve an error rate equal or very close to classical decoding with ideally tuned beam size, but unsupervisedly without specific tuning, and at around 2 times faster runtime. An additional speedup by factor 2 can be achieved by decoding the forward and backward pass in separate threads.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于增量向前和向后传递的高效几乎无错误的LVCSR解码

我们表明，大多数搜索错误可以通过对齐对称的前向和后向解码传递的结果来识别。基于这些知识，我们介绍了一种高效的高级解码架构，它几乎不会产生搜索错误，并且几乎不需要手动调优。我们使用紧凑的初始波束执行初始前向和后向解码，然后识别搜索错误，然后递归地增加波束大小，并针对错误间隔执行新的前向和后向解码，直到不再检测到搜索错误。因此，每个话语甚至每个单词都可以用正确解码所需的最小波束大小进行解码。在所有测试的系统中，我们实现了与经典解码相同或非常接近的错误率，具有理想的调谐波束大小，但没有特定的调谐，并且运行时间大约快了2倍。通过在单独的线程中解码向前和向后传递，可以实现2倍的额外加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

自引率

0.00%

发文量