自然语言图上的马尔可夫随机场

Proceedings of 1994 Workshop on Information Theory and Statistics Pub Date : 1994-10-27 DOI:10.1109/WITS.1994.513880

J. O’Sullivan, K. Mark, M. Miller

{"title":"自然语言图上的马尔可夫随机场","authors":"J. O’Sullivan, K. Mark, M. Miller","doi":"10.1109/WITS.1994.513880","DOIUrl":null,"url":null,"abstract":"The use of model-based methods for data compression for English dates back at least to Shannon's Markov chain (n-gram) models, where the probability of the next word given all previous words equals the probability of the next word given the previous n-1 words. A second approach seeks to model the hierarchical nature of language via tree graph structures arising from a context-free language (CFL). Neither the n-gram nor the CFL models approach the data compression predicted by the entropy of English as estimated by Shannon and Cover and King. This paper presents two models that incorporate the benefits of both the n-gram model and the tree-based models. In either case the neighborhood structure on the syntactic variables is determined by the tree while the neighborhood structure of the words is determined by the n-gram and the parent syntactic variable (preterminal) in the tree, Having both types of neighbors for the words should yield decreased entropy of the model and hence fewer bits per word in data compression. To motivate estimation of model parameters, some results in estimating parameters for random branching processes is reviewed.","PeriodicalId":423518,"journal":{"name":"Proceedings of 1994 Workshop on Information Theory and Statistics","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Markov random fields on graphs for natural languages\",\"authors\":\"J. O’Sullivan, K. Mark, M. Miller\",\"doi\":\"10.1109/WITS.1994.513880\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of model-based methods for data compression for English dates back at least to Shannon's Markov chain (n-gram) models, where the probability of the next word given all previous words equals the probability of the next word given the previous n-1 words. A second approach seeks to model the hierarchical nature of language via tree graph structures arising from a context-free language (CFL). Neither the n-gram nor the CFL models approach the data compression predicted by the entropy of English as estimated by Shannon and Cover and King. This paper presents two models that incorporate the benefits of both the n-gram model and the tree-based models. In either case the neighborhood structure on the syntactic variables is determined by the tree while the neighborhood structure of the words is determined by the n-gram and the parent syntactic variable (preterminal) in the tree, Having both types of neighbors for the words should yield decreased entropy of the model and hence fewer bits per word in data compression. To motivate estimation of model parameters, some results in estimating parameters for random branching processes is reviewed.\",\"PeriodicalId\":423518,\"journal\":{\"name\":\"Proceedings of 1994 Workshop on Information Theory and Statistics\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1994 Workshop on Information Theory and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WITS.1994.513880\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 Workshop on Information Theory and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WITS.1994.513880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

基于模型的英语数据压缩方法的使用至少可以追溯到Shannon的马尔可夫链(n-gram)模型，在该模型中，给定前面所有单词的下一个单词的概率等于给定前面n-1个单词的下一个单词的概率。第二种方法寻求通过由上下文无关语言(CFL)产生的树状图结构来建模语言的层次本质。n-gram模型和CFL模型都无法达到Shannon、Cover和King所估计的英语熵所预测的数据压缩。本文提出了两个模型，结合了n-gram模型和基于树的模型的优点。在任何一种情况下，句法变量的邻域结构都是由树决定的，而单词的邻域结构是由n-gram和树中的父句法变量(前缀)决定的。拥有两种类型的单词邻居应该会降低模型的熵，从而减少数据压缩中每个单词的比特数。为了激励模型参数的估计，综述了随机分支过程参数估计的一些结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Markov random fields on graphs for natural languages

The use of model-based methods for data compression for English dates back at least to Shannon's Markov chain (n-gram) models, where the probability of the next word given all previous words equals the probability of the next word given the previous n-1 words. A second approach seeks to model the hierarchical nature of language via tree graph structures arising from a context-free language (CFL). Neither the n-gram nor the CFL models approach the data compression predicted by the entropy of English as estimated by Shannon and Cover and King. This paper presents two models that incorporate the benefits of both the n-gram model and the tree-based models. In either case the neighborhood structure on the syntactic variables is determined by the tree while the neighborhood structure of the words is determined by the n-gram and the parent syntactic variable (preterminal) in the tree, Having both types of neighbors for the words should yield decreased entropy of the model and hence fewer bits per word in data compression. To motivate estimation of model parameters, some results in estimating parameters for random branching processes is reviewed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 1994 Workshop on Information Theory and Statistics

自引率

0.00%

发文量

期刊最新文献

Large deviations and consistent estimates for Gibbs random fields Markov chains for modeling and analyzing digital data signals Maximized mutual information using macrocanonical probability distributions Coding for noisy feasible channels Identification via compressed data