On the Use of Suffix Arrays for Memory-Efficient Lempel-Ziv Data Compression

2009 Data Compression Conference Pub Date : 2009-03-01 DOI:10.1109/DCC.2009.50

Artur J. Ferreira, Arlindo L. Oliveira, Mário A. T. Figueiredo

{"title":"On the Use of Suffix Arrays for Memory-Efficient Lempel-Ziv Data Compression","authors":"Artur J. Ferreira, Arlindo L. Oliveira, Mário A. T. Figueiredo","doi":"10.1109/DCC.2009.50","DOIUrl":null,"url":null,"abstract":"Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this purpose, as they allow fast searches at the expense of memory usage.In recent years, there has been interest on suffix arrays (SA), due to their simplicity and low memory requirements. One key issue is that an SA can solve the sub-string problem almost as efficiently as an ST, using less memory. This paper proposes two new SA-based algorithms for LZ encoding, which require no modifications on the decoder side. Experimental results on standard benchmarks show that our algorithms, though not faster, use 3 to 5 times less memory than the ST counterparts. Another important feature of our SA-based algorithms is that the amount of memory is independent of the text to search, thus the memory that has to be allocated can be defined a priori. These features of low and predictable memory requirements are of the utmost importance in several scenarios, such as embedded systems, where memory is at a premium and speed is not critical. Finally, we point out that the new algorithms are general, in the sense that they are adequate for applications other than LZ compression, such as text retrieval and forward/backward sub-string search.","PeriodicalId":377880,"journal":{"name":"2009 Data Compression Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2009.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this purpose, as they allow fast searches at the expense of memory usage.In recent years, there has been interest on suffix arrays (SA), due to their simplicity and low memory requirements. One key issue is that an SA can solve the sub-string problem almost as efficiently as an ST, using less memory. This paper proposes two new SA-based algorithms for LZ encoding, which require no modifications on the decoder side. Experimental results on standard benchmarks show that our algorithms, though not faster, use 3 to 5 times less memory than the ST counterparts. Another important feature of our SA-based algorithms is that the amount of memory is independent of the text to search, thus the memory that has to be allocated can be defined a priori. These features of low and predictable memory requirements are of the utmost importance in several scenarios, such as embedded systems, where memory is at a premium and speed is not critical. Finally, we point out that the new algorithms are general, in the sense that they are adequate for applications other than LZ compression, such as text retrieval and forward/backward sub-string search.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

后缀数组在内存高效Lempel-Ziv数据压缩中的应用

许多研究致力于优化Lempel-Ziv (LZ) 77家族的算法，无论是在速度方面还是在内存需求方面。二叉搜索树和后缀树(ST)是经常用于此目的的数据结构，因为它们允许以牺牲内存使用为代价进行快速搜索。近年来，由于后缀数组的简单性和低内存需求，人们对其产生了兴趣。一个关键问题是，SA几乎可以像ST一样有效地解决子字符串问题，使用更少的内存。本文提出了两种新的基于sa的LZ编码算法，这两种算法不需要在解码器端进行任何修改。在标准基准测试上的实验结果表明，我们的算法虽然没有更快，但使用的内存比ST对应的算法少3到5倍。我们基于sa的算法的另一个重要特性是内存的数量与要搜索的文本无关，因此必须分配的内存可以先验地定义。这些低且可预测的内存需求的特性在一些场景中非常重要，例如嵌入式系统，在这些场景中，内存非常宝贵，而速度并不重要。最后，我们指出，新算法是通用的，因为它们适用于LZ压缩以外的应用，例如文本检索和向前/向后子字符串搜索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2009 Data Compression Conference

自引率

0.00%

发文量

期刊最新文献

Analog Joint Source Channel Coding Using Space-Filling Curves and MMSE Decoding Tree Histogram Coding for Mobile Image Matching Clustered Reversible-KLT for Progressive Lossy-to-Lossless 3d Image Coding Optimized Source-Channel Coding of Video Signals in Packet Loss Environments New Families and New Members of Integer Sequence Based Coding Methods