Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams

1995 International Conference on Acoustics, Speech, and Signal Processing Pub Date : 1995-05-09 DOI:10.1109/ICASSP.1995.479391

Sabine Deligne, F. Bimbot

引用次数: 178

Abstract

The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a maximum likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative expectation-maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

变长序列的语言建模:多重图的理论表述和评价

多元图模型假设语言可以被描述为无记忆源的输出，该源发出可变长度的单词序列。模型参数的估计可以表述为不完全数据的极大似然估计问题。我们证明了模型参数的估计可以通过迭代期望最大化算法计算，并描述了其实现的前向后过程。我们报告了在ATIS数据库上对语言建模的多图进行系统评估的结果。客观性能度量是测试集的困惑度。我们的结果表明，在这个任务中，多重图优于传统的n图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

1995 International Conference on Acoustics, Speech, and Signal Processing

自引率

0.00%

发文量

期刊最新文献

Language identification with phonological and lexical models Computationally efficient wavelet packet coding of wide-band stereo audio signals Signaling techniques using solitons Blind source detection and separation using second order non-stationarity On blind channel identification for impulsive signal environments