Pruning exponential language models

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI:10.1109/ASRU.2011.6163937

Stanley F. Chen, A. Sethy, B. Ramabhadran

引用次数: 4

Abstract

Language model pruning is an essential technology for speech applications running on resource-constrained devices, and many pruning algorithms have been developed for conventional word n-gram models. However, while exponential language models can give superior performance, there has been little work on the pruning of these models. In this paper, we propose several pruning algorithms for general exponential language models. We show that our best algorithm applied to an exponential n-gram model outperforms existing n-gram model pruning algorithms by up to 0.4% absolute in speech recognition word-error rate on Wall Street Journal and Broadcast News data sets. In addition, we show that Model M, an exponential class-based language model, retains its performance improvement over conventional word n-gram models when pruned to equal size, with gains of up to 2.5% absolute in word-error rate.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

修剪指数语言模型

语言模型修剪是在资源受限设备上运行的语音应用程序的一项重要技术，针对传统的词n图模型已经开发了许多修剪算法。然而，虽然指数语言模型可以提供更好的性能，但对这些模型进行修剪的工作很少。本文提出了几种适用于一般指数语言模型的剪枝算法。我们表明，应用于指数n-gram模型的最佳算法在华尔街日报和广播新闻数据集的语音识别单词错误率上优于现有n-gram模型修剪算法，绝对错误率高达0.4%。此外，我们表明，模型M，一个指数级的基于类的语言模型，在修剪到相同大小时，仍然比传统的单词n-gram模型保持性能改进，单词错误率的绝对增益高达2.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量

期刊最新文献

Applying feature bagging for more accurate and robust automated speaking assessment Towards choosing better primes for spoken dialog systems Accent level adjustment in bilingual Thai-English text-to-speech synthesis Fast speaker diarization using a high-level scripting language Evaluating prosodic features for automated scoring of non-native read speech