On the Use of Suffix Arrays for Memory-Efficient Lempel-Ziv Data Compression

Artur J. Ferreira, Arlindo L. Oliveira, Mário A. T. Figueiredo
{"title":"On the Use of Suffix Arrays for Memory-Efficient Lempel-Ziv Data Compression","authors":"Artur J. Ferreira, Arlindo L. Oliveira, Mário A. T. Figueiredo","doi":"10.1109/DCC.2009.50","DOIUrl":null,"url":null,"abstract":"Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this purpose, as they allow fast searches at the expense of memory usage.In recent years, there has been interest on suffix arrays (SA), due to their simplicity and low memory requirements. One key issue is that an SA can solve the sub-string problem almost as efficiently as an ST, using less memory.  This paper proposes two new SA-based algorithms for LZ encoding, which require no modifications on the decoder side. Experimental results on standard benchmarks show that our algorithms, though not faster,  use 3 to 5 times less memory than the ST counterparts. Another important feature of our SA-based algorithms is that the amount of memory is independent of the text to search, thus the memory that has to be allocated can be defined a priori. These features of low and predictable memory requirements are of the utmost importance in several scenarios, such as embedded systems, where memory is at a premium and speed is not critical. Finally, we point out that the new algorithms are general, in the sense that they are adequate for applications other than LZ compression, such as text retrieval and forward/backward sub-string search.","PeriodicalId":377880,"journal":{"name":"2009 Data Compression Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2009.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this purpose, as they allow fast searches at the expense of memory usage.In recent years, there has been interest on suffix arrays (SA), due to their simplicity and low memory requirements. One key issue is that an SA can solve the sub-string problem almost as efficiently as an ST, using less memory.  This paper proposes two new SA-based algorithms for LZ encoding, which require no modifications on the decoder side. Experimental results on standard benchmarks show that our algorithms, though not faster,  use 3 to 5 times less memory than the ST counterparts. Another important feature of our SA-based algorithms is that the amount of memory is independent of the text to search, thus the memory that has to be allocated can be defined a priori. These features of low and predictable memory requirements are of the utmost importance in several scenarios, such as embedded systems, where memory is at a premium and speed is not critical. Finally, we point out that the new algorithms are general, in the sense that they are adequate for applications other than LZ compression, such as text retrieval and forward/backward sub-string search.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
后缀数组在内存高效Lempel-Ziv数据压缩中的应用
许多研究致力于优化Lempel-Ziv (LZ) 77家族的算法,无论是在速度方面还是在内存需求方面。二叉搜索树和后缀树(ST)是经常用于此目的的数据结构,因为它们允许以牺牲内存使用为代价进行快速搜索。近年来,由于后缀数组的简单性和低内存需求,人们对其产生了兴趣。一个关键问题是,SA几乎可以像ST一样有效地解决子字符串问题,使用更少的内存。本文提出了两种新的基于sa的LZ编码算法,这两种算法不需要在解码器端进行任何修改。在标准基准测试上的实验结果表明,我们的算法虽然没有更快,但使用的内存比ST对应的算法少3到5倍。我们基于sa的算法的另一个重要特性是内存的数量与要搜索的文本无关,因此必须分配的内存可以先验地定义。这些低且可预测的内存需求的特性在一些场景中非常重要,例如嵌入式系统,在这些场景中,内存非常宝贵,而速度并不重要。最后,我们指出,新算法是通用的,因为它们适用于LZ压缩以外的应用,例如文本检索和向前/向后子字符串搜索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analog Joint Source Channel Coding Using Space-Filling Curves and MMSE Decoding Tree Histogram Coding for Mobile Image Matching Clustered Reversible-KLT for Progressive Lossy-to-Lossless 3d Image Coding Optimized Source-Channel Coding of Video Signals in Packet Loss Environments New Families and New Members of Integer Sequence Based Coding Methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1