Multi-lingual cascading text compressors for WWW

Chi-Hung Chi
{"title":"Multi-lingual cascading text compressors for WWW","authors":"Chi-Hung Chi","doi":"10.1109/ITCC.2000.844279","DOIUrl":null,"url":null,"abstract":"Global sharing and distribution of information on the Internet result in a great demand for efficient multi-lingual text compression for Web servers and proxy implementations. Current text compressors such as Huffman coding, Lempel-Ziv (LZ) variants, and LZ-Huffman cascading fail to perform efficiently because of the mis-matched character sampling size and the large character set of multilingual languages. Our previous research has shown that a better compression ratio can be obtained by re-adjusting the character sampling rate. We investigate the cascading of LZ variants to Huffman coding for multilingual documents. Two basic approaches, static and dynamic dictionaries, are proposed. Techniques for reducing the dictionary overhead are also suggested. Based on our multi-lingual corpus, our adaptive cascading scheme can perform better than the well-known cascading compressor, gzip, by an average of about 20%.","PeriodicalId":146581,"journal":{"name":"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2000.844279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Global sharing and distribution of information on the Internet result in a great demand for efficient multi-lingual text compression for Web servers and proxy implementations. Current text compressors such as Huffman coding, Lempel-Ziv (LZ) variants, and LZ-Huffman cascading fail to perform efficiently because of the mis-matched character sampling size and the large character set of multilingual languages. Our previous research has shown that a better compression ratio can be obtained by re-adjusting the character sampling rate. We investigate the cascading of LZ variants to Huffman coding for multilingual documents. Two basic approaches, static and dynamic dictionaries, are proposed. Techniques for reducing the dictionary overhead are also suggested. Based on our multi-lingual corpus, our adaptive cascading scheme can perform better than the well-known cascading compressor, gzip, by an average of about 20%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于WWW的多语言级联文本压缩器
Internet上信息的全局共享和分发导致对Web服务器和代理实现的高效多语言文本压缩的巨大需求。当前的文本压缩器如Huffman编码、Lempel-Ziv (LZ)变体和LZ-Huffman级联等由于字符采样大小不匹配和多语言语言的大字符集而无法有效执行。我们之前的研究表明,通过重新调整字符采样率可以获得更好的压缩比。我们研究了多语言文档的LZ变体到霍夫曼编码的级联。提出了静态字典和动态字典两种基本方法。还建议了减少字典开销的技术。基于我们的多语言语料库,我们的自适应级联方案比著名的级联压缩器gzip的性能平均提高约20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Simple collusion-secure fingerprinting schemes for images Object-relational database management system (ORDBMS) using frame model approach Dynamic adaptive forward error control framework for image transmission over lossy networks General-purpose processor Huffman encoding extension A robust parallel architecture for adaptive color quantization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1