Real-time traversal in grammar-based compressed files

L. Gąsieniec, R. Kolpakov, I. Potapov, P. Sant
{"title":"Real-time traversal in grammar-based compressed files","authors":"L. Gąsieniec, R. Kolpakov, I. Potapov, P. Sant","doi":"10.1109/DCC.2005.78","DOIUrl":null,"url":null,"abstract":"Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"9 1","pages":"458-"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2005.78","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51

Abstract

Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
实时遍历基于语法的压缩文件
只提供摘要形式。在文本压缩应用程序中,能够处理压缩数据而不需要(完全)解压缩是很重要的。在这种情况下,研究压缩方法是至关重要的,这些方法允许时间/空间有效地访问压缩文件的任何片段,而不必强制执行完全解压缩。本文研究了基于语法压缩的压缩文件中连续符号的实时恢复。在这种情况下,压缩文本被表示为一个小的(几Kb)字典D(包含一组码字)和一个非常长的(几Mb)字符串(基于从字典D中绘制的符号)。这种压缩的空间效率与基于Lempel-Ziv方法的标准压缩方法相当。我们证明,一个人可以访问原始文本的连续符号,在恒定的时间和额外的O(|D|)空间内从一个符号移动到另一个符号。该算法是对(L. Gasieniec et al ., Proc. 13 Int)中提出的在线线性(摊平)时间算法的改进。计算机协会。在基金。西奥公司。生物医学工程学报,vol.2138, p.138-152, 2001)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Faster Maximal Exact Matches with Lazy LCP Evaluation. Recursive Prefix-Free Parsing for Building Big BWTs. PHONI: Streamed Matching Statistics with Multi-Genome References. Client-Driven Transmission of JPEG2000 Image Sequences Using Motion Compensated Conditional Replenishment GeneComp, a new reference-based compressor for SAM files.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1