Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-and-Solve

Amir Abboud, A. Backurs, K. Bringmann, Marvin Künnemann
{"title":"Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-and-Solve","authors":"Amir Abboud, A. Backurs, K. Bringmann, Marvin Künnemann","doi":"10.1109/FOCS.2017.26","DOIUrl":null,"url":null,"abstract":"Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size n of data that originally has size N, and we want to solve a problem with time complexity T(⋅). The naïve strategy of decompress-and-solve gives time T(N), whereas the gold standard is time T(n): to analyze the compression as efficiently as if the original data was small.We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar-Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design.We introduce a direly needed framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are:• The O(nN√log(N/n)) bound for LCS and the O(min(N log N, nM)) bound for Pattern Matching with Wildcards are optimal up to N^{o(1)} factors, under the Strong Exponential Time Hypothesis. (Here, M denotes the uncompressed length of the compressed pattern.)• Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the k-Clique conjecture.• We give an algorithm showing that decompress-and-solve is not optimal for Disjointness.","PeriodicalId":311592,"journal":{"name":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2017.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size n of data that originally has size N, and we want to solve a problem with time complexity T(⋅). The naïve strategy of decompress-and-solve gives time T(N), whereas the gold standard is time T(n): to analyze the compression as efficiently as if the original data was small.We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar-Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design.We introduce a direly needed framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are:• The O(nN√log(N/n)) bound for LCS and the O(min(N log N, nM)) bound for Pattern Matching with Wildcards are optimal up to N^{o(1)} factors, under the Strong Exponential Time Hypothesis. (Here, M denotes the uncompressed length of the compressed pattern.)• Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the k-Clique conjecture.• We give an algorithm showing that decompress-and-solve is not optimal for Disjointness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分析压缩数据的细粒度复杂性:对解压缩和求解的量化改进
我们能在不解压的情况下分析数据吗?随着我们的数据不断增长,理解压缩输入问题的时间复杂性,而不是以方便的未压缩形式,变得越来越重要。假设我们对原始大小为n的数据进行大小为n的压缩,并且我们想要解决一个时间复杂度为T(⋅)的问题。解压缩和求解的naï 5策略需要时间T(N),而黄金标准是时间T(N):像原始数据很小一样有效地分析压缩。我们将注意力限制在字符串形式的数据(文本、文件、基因组等)上,并研究最普遍的任务。虽然这个挑战似乎很大程度上取决于特定的压缩方案,但大多数实际相关的方法(Lempel-Ziv-family、字典方法等)都可以统一在语法压缩的优雅概念之下。大量的文献,跨越许多学科,将这一概念确立为算法设计的一个有影响力的概念。我们引入了一个迫切需要的框架来证明这个领域的(有条件的)下界,允许我们评估解压缩和求解是否可以改进,以及改进多少。我们的主要结果是:•在强指数时间假设下,LCS的O(nN√log(N/ N))界和通配符模式匹配的O(min(N log N, nM))界在N^{O(1)}因子范围内是最优的。(这里,M表示压缩模式的未压缩长度)•在k-Clique猜想下,解压缩并求解本质上是上下文无关语法解析和RNA折叠的最佳选择。•我们给出了一种算法,表明解压缩求解对于不连接不是最优的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On Learning Mixtures of Well-Separated Gaussians Obfuscating Compute-and-Compare Programs under LWE Minor-Free Graphs Have Light Spanners Lockable Obfuscation How to Achieve Non-Malleability in One or Two Rounds
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1