{"title":"Real-time traversal in grammar-based compressed files","authors":"L. Gąsieniec, R. Kolpakov, I. Potapov, P. Sant","doi":"10.1109/DCC.2005.78","DOIUrl":null,"url":null,"abstract":"Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"9 1","pages":"458-"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2005.78","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51
Abstract
Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).
只提供摘要形式。在文本压缩应用程序中,能够处理压缩数据而不需要(完全)解压缩是很重要的。在这种情况下,研究压缩方法是至关重要的,这些方法允许时间/空间有效地访问压缩文件的任何片段,而不必强制执行完全解压缩。本文研究了基于语法压缩的压缩文件中连续符号的实时恢复。在这种情况下,压缩文本被表示为一个小的(几Kb)字典D(包含一组码字)和一个非常长的(几Mb)字符串(基于从字典D中绘制的符号)。这种压缩的空间效率与基于Lempel-Ziv方法的标准压缩方法相当。我们证明,一个人可以访问原始文本的连续符号,在恒定的时间和额外的O(|D|)空间内从一个符号移动到另一个符号。该算法是对(L. Gasieniec et al ., Proc. 13 Int)中提出的在线线性(摊平)时间算法的改进。计算机协会。在基金。西奥公司。生物医学工程学报,vol.2138, p.138-152, 2001)。