Adapting n-gram maximum entropy language models with conditional entropy regularization

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI:10.1109/ASRU.2011.6163934

A. Rastrow, Mark Dredze, S. Khudanpur

引用次数: 5

Abstract

Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of-domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

采用条件熵正则化的n元最大熵语言模型

语言模型参数的准确估计对于构建高质量的文本生成系统(如自动语音识别)至关重要。然而，感兴趣的领域的文本训练数据通常是不可用的。相反，我们使用半监督模型自适应;使用未标记的域内数据(原始语音音频)和标记的域外数据(文本)来估计参数。在这项工作中，我们提出了一种新的半监督语言模型自适应过程，用于具有n-gram特征的最大熵模型。我们在域外文本数据的基础上增加了一个附加项，以最小化域内音频的条件熵。此外，我们还演示了如何使用一阶和二阶期望半环在语音格上有效地计算条件熵。当将最大熵语言模型从广播新闻改编到MIT讲座时，我们展示了在单词错误率方面比其他自适应技术的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量

期刊最新文献

Applying feature bagging for more accurate and robust automated speaking assessment Towards choosing better primes for spoken dialog systems Accent level adjustment in bilingual Thai-English text-to-speech synthesis Fast speaker diarization using a high-level scripting language Evaluating prosodic features for automated scoring of non-native read speech