变长序列的语言建模:多重图的理论表述和评价

1995 International Conference on Acoustics, Speech, and Signal Processing Pub Date : 1995-05-09 DOI:10.1109/ICASSP.1995.479391

Sabine Deligne, F. Bimbot

{"title":"变长序列的语言建模:多重图的理论表述和评价","authors":"Sabine Deligne, F. Bimbot","doi":"10.1109/ICASSP.1995.479391","DOIUrl":null,"url":null,"abstract":"The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a maximum likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative expectation-maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"178","resultStr":"{\"title\":\"Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams\",\"authors\":\"Sabine Deligne, F. Bimbot\",\"doi\":\"10.1109/ICASSP.1995.479391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a maximum likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative expectation-maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task.\",\"PeriodicalId\":300119,\"journal\":{\"name\":\"1995 International Conference on Acoustics, Speech, and Signal Processing\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"178\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1995 International Conference on Acoustics, Speech, and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1995.479391\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1995 International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1995.479391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 178

摘要

多元图模型假设语言可以被描述为无记忆源的输出，该源发出可变长度的单词序列。模型参数的估计可以表述为不完全数据的极大似然估计问题。我们证明了模型参数的估计可以通过迭代期望最大化算法计算，并描述了其实现的前向后过程。我们报告了在ATIS数据库上对语言建模的多图进行系统评估的结果。客观性能度量是测试集的困惑度。我们的结果表明，在这个任务中，多重图优于传统的n图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams

The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a maximum likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative expectation-maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

1995 International Conference on Acoustics, Speech, and Signal Processing

自引率

0.00%

发文量

期刊最新文献

Language identification with phonological and lexical models Computationally efficient wavelet packet coding of wide-band stereo audio signals Signaling techniques using solitons Blind source detection and separation using second order non-stationarity On blind channel identification for impulsive signal environments