{"title":"Coding for compression in full-text retrieval systems","authors":"Alistair Moffat, J. Zobel","doi":"10.1109/DCC.1992.227474","DOIUrl":null,"url":null,"abstract":"Witten, Bell and Nevill (see ibid., p.23, 1991) have described compression models for use in full-text retrieval systems. The authors discuss other coding methods for use with the same models, and give results that show their scheme yielding virtually identical compression, and decoding more than forty times faster. One of the main features of their implementation is the complete absence of arithmetic coding; this, in part, is the reason for the high speed. The implementation is also particularly suited to slow devices such as CD-ROM, in that the answering of a query requires one disk access for each term in the query and one disk access for each answer. All words and numbers are indexed, and there are no stop words. They have built two compressed databases.<<ETX>>","PeriodicalId":170269,"journal":{"name":"Data Compression Conference, 1992.","volume":"197 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Compression Conference, 1992.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1992.227474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42
Abstract
Witten, Bell and Nevill (see ibid., p.23, 1991) have described compression models for use in full-text retrieval systems. The authors discuss other coding methods for use with the same models, and give results that show their scheme yielding virtually identical compression, and decoding more than forty times faster. One of the main features of their implementation is the complete absence of arithmetic coding; this, in part, is the reason for the high speed. The implementation is also particularly suited to slow devices such as CD-ROM, in that the answering of a query requires one disk access for each term in the query and one disk access for each answer. All words and numbers are indexed, and there are no stop words. They have built two compressed databases.<>