Pub Date : 2019-01-29DOI: 10.4230/LIPIcs.CPM.2019.10
F. Cunial, D. Belazzougui
Given a string $T$ on an alphabet of size $sigma$, we describe a bidirectional Burrows-Wheeler index that takes $O(|T|log{sigma})$ bits of space, and that supports the addition emph{and removal} of one character, on the left or right side of any substring of $T$, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of $T$, but they could support removal only from specific substrings of $T$. We also describe an index that supports bidirectional addition and removal in $O(log{log{|T|}})$ time, and that occupies a number of words proportional to the number of left and right extensions of the maximal repeats of $T$. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs in small space, with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal.
{"title":"Fully-functional bidirectional Burrows-Wheeler indexes","authors":"F. Cunial, D. Belazzougui","doi":"10.4230/LIPIcs.CPM.2019.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.10","url":null,"abstract":"Given a string $T$ on an alphabet of size $sigma$, we describe a bidirectional Burrows-Wheeler index that takes $O(|T|log{sigma})$ bits of space, and that supports the addition emph{and removal} of one character, on the left or right side of any substring of $T$, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of $T$, but they could support removal only from specific substrings of $T$. We also describe an index that supports bidirectional addition and removal in $O(log{log{|T|}})$ time, and that occupies a number of words proportional to the number of left and right extensions of the maximal repeats of $T$. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs in small space, with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127996224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-01-16DOI: 10.4230/LIPIcs.CPM.2019.7
N. Prezza, Giovanna Rosone
We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1, {sigma}] can be computed from the Burrows-Wheeler transformed collection in O(n log {sigma}) time using o(n log {sigma}) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. The procedure at the core of our algorithms can be used to enumerate suffix tree intervals in succinct space from the BWT, which is of independent interest. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.
{"title":"Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform","authors":"N. Prezza, Giovanna Rosone","doi":"10.4230/LIPIcs.CPM.2019.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.7","url":null,"abstract":"We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1, {sigma}] can be computed from the Burrows-Wheeler transformed collection in O(n log {sigma}) time using o(n log {sigma}) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. The procedure at the core of our algorithms can be used to enumerate suffix tree intervals in succinct space from the BWT, which is of independent interest. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131193269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-03DOI: 10.4230/LIPIcs.CPM.2019.4
N. Prezza
We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal $O(log(n/gamma)/loglog n)$ time within $Oleft (gamma rm{polylog} n right)$ space. In this paper, we extend this result to emph{rank} and emph{select} queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a $loglog n$ time-factor in emph{select} queries. We also provide matching lower and upper bounds for emph{partial sum} and emph{predecessor} queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.
我们研究了在以$S$的字符串吸引子的大小$gamma$为界的空间内支持对长度为$n$的字符串$S$的查询的问题。最近的研究表明,在$Oleft (gamma rm{polylog} n right)$空间的最优$O(log(n/gamma)/loglog n)$时间内,可以支持对$S$的随机访问。在本文中,我们将这个结果扩展到emph{排序}和emph{选择}查询,并提供与多对数大小的字母的上界匹配的下界。我们的解决方案以时空折衷的形式给出,这种折衷比以前已知的语法折衷更通用,并且通过在emph{选择}查询中增加$loglog n$时间因子来改进lz77压缩文本的现有边界。我们还为吸引子有界空间中的emph{部分}和和emph{前导}查询提供了匹配的下界和上界,并扩展了下界以包含字典压缩树表示的导航。
{"title":"Optimal Rank and Select Queries on Dictionary-Compressed Text","authors":"N. Prezza","doi":"10.4230/LIPIcs.CPM.2019.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.4","url":null,"abstract":"We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal $O(log(n/gamma)/loglog n)$ time within $Oleft (gamma rm{polylog} n right)$ space. In this paper, we extend this result to emph{rank} and emph{select} queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a $loglog n$ time-factor in emph{select} queries. We also provide matching lower and upper bounds for emph{partial sum} and emph{predecessor} queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126369176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.4230/LIPIcs.CPM.2019.15
J. Studeny, P. Uznański
Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching problem asks for computation of a particular emph{distance} function between $P$ and every $m$-substring of $T$. We consider a $(1pmvarepsilon)$ multiplicative approximation variant of this problem, for $ell_p$ distance function. In this paper, we describe two $(1+varepsilon)$-approximate algorithms with a runtime of $widetilde{O}(frac{n}{varepsilon})$ for all (constant) non-negative values of $p$. For constant $p ge 1$ we show a deterministic $(1+varepsilon)$-approximation algorithm. Previously, such run time was known only for the case of $ell_1$ distance, by Gawrychowski and Uznanski [ICALP 2018] and only with a randomized algorithm. For constant $0 le p le 1$ we show a randomized algorithm for the $ell_p$, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of $p=0$) and of Gawrychowski and Uznanski for $ell_1$ distance.
给定长度为$n$的文本$T$和长度为$m$的模式$P$,近似模式匹配问题要求计算$P$和$T$的每个$m$ -子串之间的特定emph{距离}函数。对于$ell_p$距离函数,我们考虑这个问题的$(1pmvarepsilon)$乘法近似变体。在本文中,我们描述了两个$(1+varepsilon)$ -近似算法,对$p$的所有(常数)非负值的运行时间为$widetilde{O}(frac{n}{varepsilon})$。对于常数$p ge 1$,我们给出了一个确定性的$(1+varepsilon)$ -近似算法。在此之前,Gawrychowski和Uznanski [ICALP 2018]仅在$ell_1$距离的情况下才知道这样的运行时间,并且只使用随机算法。对于常数$0 le p le 1$,我们展示了$ell_p$的随机算法,从而在Kopelowitz和Porat [FOCS 2015, SOSA 2018]的Hamming距离算法($p=0$的情况)和Gawrychowski和Uznanski的$ell_1$距离算法之间提供了平滑的权衡。
{"title":"Approximating Approximate Pattern Matching","authors":"J. Studeny, P. Uznański","doi":"10.4230/LIPIcs.CPM.2019.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.15","url":null,"abstract":"Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching problem asks for computation of a particular emph{distance} function between $P$ and every $m$-substring of $T$. We consider a $(1pmvarepsilon)$ multiplicative approximation variant of this problem, for $ell_p$ distance function. In this paper, we describe two $(1+varepsilon)$-approximate algorithms with a runtime of $widetilde{O}(frac{n}{varepsilon})$ for all (constant) non-negative values of $p$. For constant $p ge 1$ we show a deterministic $(1+varepsilon)$-approximation algorithm. Previously, such run time was known only for the case of $ell_1$ distance, by Gawrychowski and Uznanski [ICALP 2018] and only with a randomized algorithm. For constant $0 le p le 1$ we show a randomized algorithm for the $ell_p$, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of $p=0$) and of Gawrychowski and Uznanski for $ell_1$ distance.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126321322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-08-10DOI: 10.4230/LIPICS.CPM.2019.20
L. Bulteau, Konrad K. Dabrowski, G. Fertin, Matthew Johnson, D. Paulusma, Stéphane Vialette
A partition (V_1,...,V_k) of the vertex set of a graph G with a (not necessarily proper) colouring c is colourful if no two vertices in any V_i have the same colour and every set V_i induces a connected graph. The Colourful Partition problem, introduced by Adamaszek and Popa, is to decide whether a coloured graph (G,c) has a colourful partition of size at most k. This problem is related to the Colourful Components problem, introduced by He, Liu and Zhao, which is to decide whether a graph can be modified into a graph whose connected components form a colourful partition by deleting at most p edges. Despite the similarities in their definitions, we show that Colourful Partition and Colourful Components may have different complexities for restricted instances. We tighten known NP-hardness results for both problems by closing a number of complexity gaps. In addition, we prove new hardness and tractability results for Colourful Partition. In particular, we prove that deciding whether a coloured graph (G,c) has a colourful partition of size 2 is NP-complete for coloured planar bipartite graphs of maximum degree 3 and path-width 3, but polynomial-time solvable for coloured graphs of treewidth 2. Rather than performing an ad hoc study, we use our classical complexity results to guide us in undertaking a thorough parameterized study of Colourful Partition. We show that this leads to suitable parameters for obtaining FPT results and moreover prove that Colourful Components and Colourful Partition may have different parameterized complexities, depending on the chosen parameter.
{"title":"Finding a Small Number of Colourful Components","authors":"L. Bulteau, Konrad K. Dabrowski, G. Fertin, Matthew Johnson, D. Paulusma, Stéphane Vialette","doi":"10.4230/LIPICS.CPM.2019.20","DOIUrl":"https://doi.org/10.4230/LIPICS.CPM.2019.20","url":null,"abstract":"A partition (V_1,...,V_k) of the vertex set of a graph G with a (not necessarily proper) colouring c is colourful if no two vertices in any V_i have the same colour and every set V_i induces a connected graph. The Colourful Partition problem, introduced by Adamaszek and Popa, is to decide whether a coloured graph (G,c) has a colourful partition of size at most k. This problem is related to the Colourful Components problem, introduced by He, Liu and Zhao, which is to decide whether a graph can be modified into a graph whose connected components form a colourful partition by deleting at most p edges. \u0000Despite the similarities in their definitions, we show that Colourful Partition and Colourful Components may have different complexities for restricted instances. We tighten known NP-hardness results for both problems by closing a number of complexity gaps. In addition, we prove new hardness and tractability results for Colourful Partition. In particular, we prove that deciding whether a coloured graph (G,c) has a colourful partition of size 2 is NP-complete for coloured planar bipartite graphs of maximum degree 3 and path-width 3, but polynomial-time solvable for coloured graphs of treewidth 2. \u0000Rather than performing an ad hoc study, we use our classical complexity results to guide us in undertaking a thorough parameterized study of Colourful Partition. We show that this leads to suitable parameters for obtaining FPT results and moreover prove that Colourful Components and Colourful Partition may have different parameterized complexities, depending on the chosen parameter.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131110597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-02DOI: 10.4230/LIPIcs.CPM.2018.21
Bastien Cazaux, Eric Rivals
A superstring of a set of words P = {s_1, ..., s_p } is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word s_i comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of s_i. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems.
{"title":"Superstrings with multiplicities","authors":"Bastien Cazaux, Eric Rivals","doi":"10.4230/LIPIcs.CPM.2018.21","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.21","url":null,"abstract":"A superstring of a set of words P = {s_1, ..., s_p } is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word s_i comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of s_i. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115399672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-02DOI: 10.4230/LIPIcs.CPM.2018.17
G. Fertin, J. Fradin, Christian Komusiewicz
Let G = (V, A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r. The color hierarchy graph H(G) of G is defined as follows: V (H(G)) is the color set C of G, and H(G) has an arc from c to c if G has an arc from a vertex of color c to a vertex of color c. We study the Maximum Colorful Arborescence (MCA) problem, which takes as input a DAG G such that H(G) is also a DAG, and aims at finding in G a maximum-weight arborescence rooted in r in which no color appears more than once. The MCA problem models the de novo inference of unknown metabolites by mass spectrometry experiments. Although the problem has been introduced ten years ago (under a different name), it was only recently pointed out that a crucial additional property in the problem definition was missing: by essence, H(G) must be a DAG. In this paper, we further investigate MCA under this new light and provide new algorithmic results for this problem, with a specific focus on fixed-parameter tractability (FPT) issues for different structural parameters of H(G). In particular, we show there exists an O(3 ∗ H) time algorithm for solving MCA, where nH is the number of vertices of indegree at least two in H(G), thereby improving the O(3) algorithm from Böcker et al. [Proc. ECCB ’08]. We also prove that MCA is W[2]-hard relatively to the treewidth Ht of the underlying undirected graph of H(G), and further show that it is FPT relatively to Ht + lC , where lC := |V | − |C|. 2012 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems, G.2.1 Combinatorics, G.2.2 Graph Theory
{"title":"On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure","authors":"G. Fertin, J. Fradin, Christian Komusiewicz","doi":"10.4230/LIPIcs.CPM.2018.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.17","url":null,"abstract":"Let G = (V, A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r. The color hierarchy graph H(G) of G is defined as follows: V (H(G)) is the color set C of G, and H(G) has an arc from c to c if G has an arc from a vertex of color c to a vertex of color c. We study the Maximum Colorful Arborescence (MCA) problem, which takes as input a DAG G such that H(G) is also a DAG, and aims at finding in G a maximum-weight arborescence rooted in r in which no color appears more than once. The MCA problem models the de novo inference of unknown metabolites by mass spectrometry experiments. Although the problem has been introduced ten years ago (under a different name), it was only recently pointed out that a crucial additional property in the problem definition was missing: by essence, H(G) must be a DAG. In this paper, we further investigate MCA under this new light and provide new algorithmic results for this problem, with a specific focus on fixed-parameter tractability (FPT) issues for different structural parameters of H(G). In particular, we show there exists an O(3 ∗ H) time algorithm for solving MCA, where nH is the number of vertices of indegree at least two in H(G), thereby improving the O(3) algorithm from Böcker et al. [Proc. ECCB ’08]. We also prove that MCA is W[2]-hard relatively to the treewidth Ht of the underlying undirected graph of H(G), and further show that it is FPT relatively to Ht + lC , where lC := |V | − |C|. 2012 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems, G.2.1 Combinatorics, G.2.2 Graph Theory","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"4 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121008188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.4230/LIPIcs.CPM.2018.24
Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda
We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.
我们重新计算的林登分解的问题给出一个字符串的长度N w的直线程序(SLP)大小N。对于这个问题,我们展示了一个新的算法运行在O (P (N, N) + N (N, N)日志O (log N))时间和O (N log N + S (N, N))空间,P (N, N), (N, N)、问(N, N)分别预处理时间,空间,和查询时间最长公共数据结构的扩展(特性)得到。我们的算法改进了I等人(TCS '17)提出的算法,并且在w是高度可压缩的情况下,可以比Duval (J. Algorithms '83)的O(N)时间解更有效。
{"title":"Lyndon Factorization of Grammar Compressed Texts Revisited","authors":"Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.24","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.24","url":null,"abstract":"We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"47 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114003091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.4230/LIPIcs.CPM.2018.15
Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, H. Bannai, M. Takeda
A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.
{"title":"Computing longest common square subsequences","authors":"Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.15","url":null,"abstract":"A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127553686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.4230/LIPIcs.CPM.2018.19
Y. Urabe, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda
The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.
{"title":"Longest Lyndon Substring After Edit","authors":"Y. Urabe, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.19","url":null,"abstract":"The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125657054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}