A tree based binary encoding of text using LZW algorithm

T. Acharya, A. Mukherjee
{"title":"A tree based binary encoding of text using LZW algorithm","authors":"T. Acharya, A. Mukherjee","doi":"10.1109/DCC.1995.515573","DOIUrl":null,"url":null,"abstract":"Summary form only given. The most popular adaptive dictionary coding scheme used for text compression is the LZW algorithm. In the LZW algorithm, a changing dictionary contains common strings that have been encountered so far in the text. The dictionary can be represented by a dynamic trie. The input text is examined character by character and the longest substring (called a prefix string) of the text which already exists in the trie, is replaced by a pointer to a node in the trie which represents the prefix string. Motivation of our research is to explore a variation of the LZW algorithm for variable-length binary encoding of text (we call it the LZWA algorithm) and to develop a memory-based VLSI architecture for text compression. We proposed a new methodology to represent the trie in the form of a binary tree (we call it a binary trie) to maintain the dictionary used in the LZW scheme. This binary tree maintains all the properties of the trie and can easily be mapped into memory. As a result, the common substrings can be encoded using variable length prefix binary codes. The prefix codes enable us to uniquely decode the text in its original form. The algorithm outperforms the usual LZW scheme when the size of the text is small (usually less than 5 K). Depending upon the characteristics of the text, the improvement of the compression ratio has been achieved around 10-30% compared to the LZW scheme. But its performance degrades for larger size texts.","PeriodicalId":107017,"journal":{"name":"Proceedings DCC '95 Data Compression Conference","volume":"160 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '95 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1995.515573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Summary form only given. The most popular adaptive dictionary coding scheme used for text compression is the LZW algorithm. In the LZW algorithm, a changing dictionary contains common strings that have been encountered so far in the text. The dictionary can be represented by a dynamic trie. The input text is examined character by character and the longest substring (called a prefix string) of the text which already exists in the trie, is replaced by a pointer to a node in the trie which represents the prefix string. Motivation of our research is to explore a variation of the LZW algorithm for variable-length binary encoding of text (we call it the LZWA algorithm) and to develop a memory-based VLSI architecture for text compression. We proposed a new methodology to represent the trie in the form of a binary tree (we call it a binary trie) to maintain the dictionary used in the LZW scheme. This binary tree maintains all the properties of the trie and can easily be mapped into memory. As a result, the common substrings can be encoded using variable length prefix binary codes. The prefix codes enable us to uniquely decode the text in its original form. The algorithm outperforms the usual LZW scheme when the size of the text is small (usually less than 5 K). Depending upon the characteristics of the text, the improvement of the compression ratio has been achieved around 10-30% compared to the LZW scheme. But its performance degrades for larger size texts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用LZW算法的基于树的文本二进制编码
只提供摘要形式。用于文本压缩的最流行的自适应字典编码方案是LZW算法。在LZW算法中,不断变化的字典包含到目前为止在文本中遇到的常见字符串。字典可以用一个动态树表示。输入文本将一个字符一个字符地检查,并且已经存在于树中的文本的最长子字符串(称为前缀字符串)将被指向树中代表前缀字符串的节点的指针所替换。我们的研究动机是探索用于文本变长二进制编码的LZW算法的一种变体(我们称之为LZWA算法),并开发用于文本压缩的基于内存的VLSI架构。我们提出了一种新的方法,以二叉树的形式表示树(我们称之为二叉树),以维护LZW方案中使用的字典。这个二叉树维护了树的所有属性,并且可以很容易地映射到内存中。因此,可以使用可变长度前缀二进制代码对公共子字符串进行编码。前缀代码使我们能够以其原始形式唯一地解码文本。当文本大小较小(通常小于5 K)时,该算法优于通常的LZW方案。根据文本的特征,与LZW方案相比,压缩比的提高约为10-30%。但是对于较大的文本,其性能会下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multiplication-free subband coding of color images Constraining the size of the instantaneous alphabet in trellis quantizers Classified conditional entropy coding of LSP parameters Lattice-based designs of direct sum codebooks for vector quantization On the performance of affine index assignments for redundancy free source-channel coding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1