构建基于单词的文本压缩算法

Data Compression Conference, 1992. Pub Date : 1992-03-24 DOI:10.1109/DCC.1992.227475

R. Horspool, G. Cormack

{"title":"构建基于单词的文本压缩算法","authors":"R. Horspool, G. Cormack","doi":"10.1109/DCC.1992.227475","DOIUrl":null,"url":null,"abstract":"Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<<ETX>>","PeriodicalId":170269,"journal":{"name":"Data Compression Conference, 1992.","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"96","resultStr":"{\"title\":\"Constructing word-based text compression algorithms\",\"authors\":\"R. Horspool, G. Cormack\",\"doi\":\"10.1109/DCC.1992.227475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<<ETX>>\",\"PeriodicalId\":170269,\"journal\":{\"name\":\"Data Compression Conference, 1992.\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"96\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Compression Conference, 1992.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1992.227475\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Compression Conference, 1992.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1992.227475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 96

摘要

文本压缩算法通常根据8位ASCII码的源字母Sigma来定义。作者考虑选择Sigma作为一个字母，其符号是英语单词，或者通常是字母数字字符和非字母数字字符交替的最大字符串。压缩算法将能够利用单词之间较长距离的相关性，从而实现更好的压缩。Sigma的大尺寸导致了一些实现问题，但这些问题已经被克服，以构建基于单词的LZW，基于单词的自适应Huffman和基于单词的上下文建模压缩算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Constructing word-based text compression algorithms

Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data Compression Conference, 1992.

自引率

0.00%

发文量

期刊最新文献

Lossless interframe compression of medical images Coding for compression in full-text retrieval systems A comparison of codebook generation techniques for vector quantization Progressive vector quantization of multispectral image data using a massively parallel SIMD machine Parallel algorithms for optimal compression using dictionaries with the prefix property