Design consideration for multi-lingual cascading text compressors

Chi-Hung Chi, IV YanZhang
{"title":"Design consideration for multi-lingual cascading text compressors","authors":"Chi-Hung Chi, IV YanZhang","doi":"10.1109/DCC.1999.785677","DOIUrl":null,"url":null,"abstract":"Summary form only given. We study the cascading of LZ variants to Huffman coding for multilingual documents. Two models are proposed: the static model and the adaptive (dynamic) model. The static model makes use of the dictionary generated by the LZW algorithm in Chinese dictionary-based Huffman compression to achieve better performance. The dynamic model is an extension of the static cascading model. During the insertion of phrases into the dictionary the frequency count of the phrases is updated so that a dynamic Huffman tree with variable length output tokens is obtained. We propose a new method to capture the \"LZW dictionary\" \"by picking up the dictionary entries during decompression. The general idea is the adding of delimiters during the decompression process so that the decompressed files are segmented into phrases that reflect how the LZW compressor makes use of its dictionary phrases to encode the source. The idea of the adaptive cascading model can be thought as an extension of the Chinese LZW compression. Since the size of the header is one important performance bottleneck in the static cascading model we propose the adaptive cascading model to address this issue. The LZW compressor is now outputting not a fixed length token, but a variable length Huffman code from the Huffman tree. It is expected that such a compressor can achieve very good compression performance. In our adaptive cascading model we choose LZW instead of LZSS because the LZW algorithm preserves more information than the LZSS algorithm does. This characteristic is found to be very useful in helping Chinese compressors to attain better performance.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1999.785677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Summary form only given. We study the cascading of LZ variants to Huffman coding for multilingual documents. Two models are proposed: the static model and the adaptive (dynamic) model. The static model makes use of the dictionary generated by the LZW algorithm in Chinese dictionary-based Huffman compression to achieve better performance. The dynamic model is an extension of the static cascading model. During the insertion of phrases into the dictionary the frequency count of the phrases is updated so that a dynamic Huffman tree with variable length output tokens is obtained. We propose a new method to capture the "LZW dictionary" "by picking up the dictionary entries during decompression. The general idea is the adding of delimiters during the decompression process so that the decompressed files are segmented into phrases that reflect how the LZW compressor makes use of its dictionary phrases to encode the source. The idea of the adaptive cascading model can be thought as an extension of the Chinese LZW compression. Since the size of the header is one important performance bottleneck in the static cascading model we propose the adaptive cascading model to address this issue. The LZW compressor is now outputting not a fixed length token, but a variable length Huffman code from the Huffman tree. It is expected that such a compressor can achieve very good compression performance. In our adaptive cascading model we choose LZW instead of LZSS because the LZW algorithm preserves more information than the LZSS algorithm does. This characteristic is found to be very useful in helping Chinese compressors to attain better performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多语言级联文本压缩器的设计考虑
只提供摘要形式。我们研究了多语言文档的LZ变体到霍夫曼编码的级联。提出了静态模型和自适应(动态)模型。静态模型在基于中文字典的霍夫曼压缩中利用LZW算法生成的字典来达到更好的性能。动态模型是静态级联模型的扩展。在将短语插入字典期间,更新短语的频率计数,从而获得具有可变长度输出令牌的动态霍夫曼树。我们提出了一种通过在解压过程中提取字典条目来捕获“LZW字典”的新方法。一般思想是在解压缩过程中添加分隔符,以便将解压缩的文件分割成短语,以反映LZW压缩器如何使用其字典短语对源进行编码。自适应级联模型的思想可以看作是对中国LZW压缩的扩展。由于报头的大小是静态级联模型中一个重要的性能瓶颈,我们提出了自适应级联模型来解决这个问题。LZW压缩器现在输出的不是固定长度的令牌,而是来自霍夫曼树的可变长度的霍夫曼代码。期望这样的压缩机可以达到非常好的压缩性能。在我们的自适应级联模型中,我们选择LZW而不是LZSS,因为LZW算法比LZSS算法保留了更多的信息。这一特性被发现对帮助中国压缩机获得更好的性能非常有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Real-time VBR rate control of MPEG video based upon lexicographic bit allocation Performance of quantizers on noisy channels using structured families of codes SICLIC: a simple inter-color lossless image coder Protein is incompressible Encoding time reduction in fractal image compression
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1