语言编辑距离和RNA折叠的快速高效空间逼近:一种失忆症动态规划方法

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) Pub Date : 2017-10-01 DOI:10.1109/FOCS.2017.35

B. Saha

{"title":"语言编辑距离和RNA折叠的快速高效空间逼近:一种失忆症动态规划方法","authors":"B. Saha","doi":"10.1109/FOCS.2017.35","DOIUrl":null,"url":null,"abstract":"Dynamic programming is a basic, and one of the most systematic techniques for developing polynomial time algorithms with overwhelming applications. However, it often suffers from having high running time and space complexity due to (a) maintaining a table of solutions for a large number of sub-instances, and (b) combining/comparing these solutions to successively solve larger sub-instances. In this paper, we consider a canonical cubic time and quadratic space dynamic programming, and show how improvements in both its time and space uses are possible. As a result, we obtain fast small-space approximation algorithms for the fundamental problems of context free grammar recognition} (the basic computer science problem of parsing), the language edit distance} (a significant generalization of string edit distance and parsing), and RNA folding} (a classical problem in bioinformatics). For these problems, ours are the first algorithms that break the cubic-time barrier of any combinatorial algorithm, and quadratic-space barrier of any algorithm significantly improving upon their long-standing space and time complexities. Our technique applies to many other problems as well including string edit distance computation, and finding longest increasing subsequence.Our improvements come from directly grinding the dynamic programming and looking through the lens of language edit distance which generalizes both context free grammar recognition, and RNA folding. From known conditional lower bound results, neither of these problems can have an exact combinatorial algorithm (one that does not use fast matrix multiplication) running in truly subcubic time. Moreover, for language edit distance such an algorithm cannot exist even when nontrivial multiplicative approximation is allowed. We overcome this hurdle by designing an additive-approximation algorithm that for any parameter k ≈ 0, uses O(nk\\log{n}) space and O(n^2k\\log{n}) time and provides an additive O(\\frac{n}{k}\\log{n})-approximation. In particular, in \\tilde{O}(n)\\footnotemark[1] space and \\tilde{O}(n^2) time it can solve deterministically whether a string belongs to a context free grammar, or ≥ilon-far from it for any constant ≥ilon ≈ 0. We also improve the above results to obtain an algorithm that outputs an ≥ilon⋅ n-additive approximation to the above problems with space complexity O(n^{2/3} \\log{n}). The space complexity remains sublinear in n, as long as ≥ilon = o(n^{-\\frac{1}{4}}). Moreover, we provide the first MapReduce and streaming algorithms for them with multiple passes and sublinear space complexity.","PeriodicalId":311592,"journal":{"name":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Fast & Space-Efficient Approximations of Language Edit Distance and RNA Folding: An Amnesic Dynamic Programming Approach\",\"authors\":\"B. Saha\",\"doi\":\"10.1109/FOCS.2017.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic programming is a basic, and one of the most systematic techniques for developing polynomial time algorithms with overwhelming applications. However, it often suffers from having high running time and space complexity due to (a) maintaining a table of solutions for a large number of sub-instances, and (b) combining/comparing these solutions to successively solve larger sub-instances. In this paper, we consider a canonical cubic time and quadratic space dynamic programming, and show how improvements in both its time and space uses are possible. As a result, we obtain fast small-space approximation algorithms for the fundamental problems of context free grammar recognition} (the basic computer science problem of parsing), the language edit distance} (a significant generalization of string edit distance and parsing), and RNA folding} (a classical problem in bioinformatics). For these problems, ours are the first algorithms that break the cubic-time barrier of any combinatorial algorithm, and quadratic-space barrier of any algorithm significantly improving upon their long-standing space and time complexities. Our technique applies to many other problems as well including string edit distance computation, and finding longest increasing subsequence.Our improvements come from directly grinding the dynamic programming and looking through the lens of language edit distance which generalizes both context free grammar recognition, and RNA folding. From known conditional lower bound results, neither of these problems can have an exact combinatorial algorithm (one that does not use fast matrix multiplication) running in truly subcubic time. Moreover, for language edit distance such an algorithm cannot exist even when nontrivial multiplicative approximation is allowed. We overcome this hurdle by designing an additive-approximation algorithm that for any parameter k ≈ 0, uses O(nk\\\\log{n}) space and O(n^2k\\\\log{n}) time and provides an additive O(\\\\frac{n}{k}\\\\log{n})-approximation. In particular, in \\\\tilde{O}(n)\\\\footnotemark[1] space and \\\\tilde{O}(n^2) time it can solve deterministically whether a string belongs to a context free grammar, or ≥ilon-far from it for any constant ≥ilon ≈ 0. We also improve the above results to obtain an algorithm that outputs an ≥ilon⋅ n-additive approximation to the above problems with space complexity O(n^{2/3} \\\\log{n}). The space complexity remains sublinear in n, as long as ≥ilon = o(n^{-\\\\frac{1}{4}}). Moreover, we provide the first MapReduce and streaming algorithms for them with multiple passes and sublinear space complexity.\",\"PeriodicalId\":311592,\"journal\":{\"name\":\"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2017.35\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2017.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

动态规划是开发具有压倒性应用的多项式时间算法的一种基本的、最系统的技术之一。然而，由于(a)维护大量子实例的解决方案表，以及(b)将这些解决方案组合/比较以先后解决更大的子实例，它经常遭受高运行时间和空间复杂性的困扰。在本文中，我们考虑一个典型的三次时间和二次空间动态规划，并说明如何改进其时间和空间的利用是可能的。因此，我们获得了快速的小空间近似算法，用于解决上下文无关语法识别｝(解析的基本计算机科学问题)，语言编辑距离｝(字符串编辑距离和解析的重要概括)和RNA折叠｝(生物信息学中的经典问题)等基本问题。对于这些问题，我们的算法是第一个打破任何组合算法的三次时间障碍和任何算法的二次空间障碍的算法，显著改善了它们长期存在的空间和时间复杂性。我们的技术也适用于许多其他问题，包括字符串编辑距离计算和寻找最长递增子序列。我们的改进来自于直接研磨动态规划，并通过语言编辑距离的视角来推广上下文无关语法识别和RNA折叠。从已知的条件下界结果来看，这两个问题都不可能有一个精确的组合算法(不使用快速矩阵乘法的算法)在真正的次立方时间内运行。此外，对于语言编辑距离，即使在允许非平凡乘法近似的情况下，这种算法也不存在。我们通过设计一种加性逼近算法来克服这一障碍，该算法对于任何参数k ＆＃x2248;0，使用O(nk \log{n})空间和O(n^2k \log{n})时间，并提供一个附加的O(\frac{n}{k}\log{n})近似。特别是，在\tilde{O} (n) \footnotemark[1]空间和\tilde{O} (n^2)时间内，它可以确定地解决字符串是否属于上下文无关语法，或者＆＃x2265;ilon ＆＃x2248;对于任何常数＆＃x2265;ilon ＆＃x2248;0. 我们还对上述结果进行了改进，得到了输出一个＆＃x2265;空间复杂度为O(n^{2/3}\log{n})的上述问题的n加性近似。只要＆＃x2265;亿= 0 (n^{-\frac{1}{4}})，空间复杂度在n中保持亚线性。此外，我们还为它们提供了第一个具有多通道和亚线性空间复杂度的MapReduce和流算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast & Space-Efficient Approximations of Language Edit Distance and RNA Folding: An Amnesic Dynamic Programming Approach

Dynamic programming is a basic, and one of the most systematic techniques for developing polynomial time algorithms with overwhelming applications. However, it often suffers from having high running time and space complexity due to (a) maintaining a table of solutions for a large number of sub-instances, and (b) combining/comparing these solutions to successively solve larger sub-instances. In this paper, we consider a canonical cubic time and quadratic space dynamic programming, and show how improvements in both its time and space uses are possible. As a result, we obtain fast small-space approximation algorithms for the fundamental problems of context free grammar recognition} (the basic computer science problem of parsing), the language edit distance} (a significant generalization of string edit distance and parsing), and RNA folding} (a classical problem in bioinformatics). For these problems, ours are the first algorithms that break the cubic-time barrier of any combinatorial algorithm, and quadratic-space barrier of any algorithm significantly improving upon their long-standing space and time complexities. Our technique applies to many other problems as well including string edit distance computation, and finding longest increasing subsequence.Our improvements come from directly grinding the dynamic programming and looking through the lens of language edit distance which generalizes both context free grammar recognition, and RNA folding. From known conditional lower bound results, neither of these problems can have an exact combinatorial algorithm (one that does not use fast matrix multiplication) running in truly subcubic time. Moreover, for language edit distance such an algorithm cannot exist even when nontrivial multiplicative approximation is allowed. We overcome this hurdle by designing an additive-approximation algorithm that for any parameter k ≈ 0, uses O(nk\log{n}) space and O(n^2k\log{n}) time and provides an additive O(\frac{n}{k}\log{n})-approximation. In particular, in \tilde{O}(n)\footnotemark[1] space and \tilde{O}(n^2) time it can solve deterministically whether a string belongs to a context free grammar, or ≥ilon-far from it for any constant ≥ilon ≈ 0. We also improve the above results to obtain an algorithm that outputs an ≥ilon⋅ n-additive approximation to the above problems with space complexity O(n^{2/3} \log{n}). The space complexity remains sublinear in n, as long as ≥ilon = o(n^{-\frac{1}{4}}). Moreover, we provide the first MapReduce and streaming algorithms for them with multiple passes and sublinear space complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

自引率

0.00%

发文量

期刊最新文献

On Learning Mixtures of Well-Separated Gaussians Obfuscating Compute-and-Compare Programs under LWE Minor-Free Graphs Have Light Spanners Lockable Obfuscation How to Achieve Non-Malleability in One or Two Rounds