文本概率模型的开放接口

Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096) Pub Date : 1999-03-29 DOI:10.1109/DCC.1999.785679

J. Cleary, W. Teahan

{"title":"文本概率模型的开放接口","authors":"J. Cleary, W. Teahan","doi":"10.1109/DCC.1999.785679","DOIUrl":null,"url":null,"abstract":"Summary form only given. An application program interface (API) for meddling sequential text is described. The API is intended to shield the user from details of the modelling and probability estimation process. This should enable different implementations of models to be replaced transparently in application programs. The motivation for this API is work on the use of textual models for applications in addition to strict data compression. The API is probabilistic, that is, it supplies the probability of the next symbol in the sequence. It is general enough to deal accurately with models that include escapes for probabilities. The concepts abstracted by the API are explained together with details of the API calls. Such predictive models can be used for a number of applications other than compression. Users of the models do not want to be concerned about the details either of the implementation of the models or how they were trained and the sources of the training text. The problem considered is how to permit code for different models and actual trained models themselves to be interchanged easily between users. The fundamental idea is that it should be possible to write application programs independent of the details of particular modelling code, that it should be possible to implement different modelling code independent of the various applications, and that it should be possible to easily exchange different pre-trained models between users. It is hoped that this independence will foster the exchange and use of high-performance modelling code, the construction of sophisticated adaptive systems based on the best available models, and the proliferation and provision of high-quality models of standard text types such as English or other natural languages, and easy comparison of different modelling techniques.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An open interface for probabilistic models of text\",\"authors\":\"J. Cleary, W. Teahan\",\"doi\":\"10.1109/DCC.1999.785679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. An application program interface (API) for meddling sequential text is described. The API is intended to shield the user from details of the modelling and probability estimation process. This should enable different implementations of models to be replaced transparently in application programs. The motivation for this API is work on the use of textual models for applications in addition to strict data compression. The API is probabilistic, that is, it supplies the probability of the next symbol in the sequence. It is general enough to deal accurately with models that include escapes for probabilities. The concepts abstracted by the API are explained together with details of the API calls. Such predictive models can be used for a number of applications other than compression. Users of the models do not want to be concerned about the details either of the implementation of the models or how they were trained and the sources of the training text. The problem considered is how to permit code for different models and actual trained models themselves to be interchanged easily between users. The fundamental idea is that it should be possible to write application programs independent of the details of particular modelling code, that it should be possible to implement different modelling code independent of the various applications, and that it should be possible to easily exchange different pre-trained models between users. It is hoped that this independence will foster the exchange and use of high-performance modelling code, the construction of sophisticated adaptive systems based on the best available models, and the proliferation and provision of high-quality models of standard text types such as English or other natural languages, and easy comparison of different modelling techniques.\",\"PeriodicalId\":103598,\"journal\":{\"name\":\"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1999.785679\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1999.785679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

只提供摘要形式。描述了干涉顺序文本的应用程序接口(API)。该API旨在使用户不了解建模和概率估计过程的细节。这样就可以在应用程序中透明地替换模型的不同实现。除了严格的数据压缩之外，这个API的动机是在应用程序中使用文本模型。API是概率性的，也就是说，它提供序列中下一个符号的概率。对于包含转义概率的模型，一般都可以准确地处理。通过API抽象的概念与API调用的细节一起进行了解释。这种预测模型可以用于除压缩之外的许多应用程序。模型的用户不希望关心模型实现的细节，也不希望关心它们是如何训练的，以及训练文本的来源。所考虑的问题是如何允许不同模型的代码和实际训练模型本身在用户之间容易地交换。其基本思想是，应该可以独立于特定建模代码的细节编写应用程序，应该可以独立于各种应用程序实现不同的建模代码，并且应该可以在用户之间轻松交换不同的预训练模型。希望这种独立性将促进高性能建模代码的交流和使用，基于最佳可用模型的复杂自适应系统的构建，标准文本类型(如英语或其他自然语言)的高质量模型的扩散和提供，以及不同建模技术的容易比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An open interface for probabilistic models of text

Summary form only given. An application program interface (API) for meddling sequential text is described. The API is intended to shield the user from details of the modelling and probability estimation process. This should enable different implementations of models to be replaced transparently in application programs. The motivation for this API is work on the use of textual models for applications in addition to strict data compression. The API is probabilistic, that is, it supplies the probability of the next symbol in the sequence. It is general enough to deal accurately with models that include escapes for probabilities. The concepts abstracted by the API are explained together with details of the API calls. Such predictive models can be used for a number of applications other than compression. Users of the models do not want to be concerned about the details either of the implementation of the models or how they were trained and the sources of the training text. The problem considered is how to permit code for different models and actual trained models themselves to be interchanged easily between users. The fundamental idea is that it should be possible to write application programs independent of the details of particular modelling code, that it should be possible to implement different modelling code independent of the various applications, and that it should be possible to easily exchange different pre-trained models between users. It is hoped that this independence will foster the exchange and use of high-performance modelling code, the construction of sophisticated adaptive systems based on the best available models, and the proliferation and provision of high-quality models of standard text types such as English or other natural languages, and easy comparison of different modelling techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)

自引率

0.00%

发文量

期刊最新文献

Real-time VBR rate control of MPEG video based upon lexicographic bit allocation Performance of quantizers on noisy channels using structured families of codes SICLIC: a simple inter-color lossless image coder Protein is incompressible Encoding time reduction in fractal image compression