首页 > 最新文献

Annual Symposium on Combinatorial Pattern Matching最新文献

英文 中文
Fully-functional bidirectional Burrows-Wheeler indexes 全功能双向Burrows-Wheeler索引
Pub Date : 2019-01-29 DOI: 10.4230/LIPIcs.CPM.2019.10
F. Cunial, D. Belazzougui
Given a string $T$ on an alphabet of size $sigma$, we describe a bidirectional Burrows-Wheeler index that takes $O(|T|log{sigma})$ bits of space, and that supports the addition emph{and removal} of one character, on the left or right side of any substring of $T$, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of $T$, but they could support removal only from specific substrings of $T$. We also describe an index that supports bidirectional addition and removal in $O(log{log{|T|}})$ time, and that occupies a number of words proportional to the number of left and right extensions of the maximal repeats of $T$. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs in small space, with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal.
给定大小为$sigma$的字母表上的字符串$T$,我们描述了一个双向Burrows-Wheeler索引,它占用$O(|T|log{sigma})$位空间,并且支持在常数时间内在$T$的任何子字符串的左侧或右侧添加emph{和删除}一个字符。以前已知的使用相同空间的数据结构允许对$T$的任何子字符串进行恒定时间的添加,但是它们只能支持从$T$的特定子字符串中删除。我们还描述了一个索引,该索引支持在$O(log{log{|T|}})$时间内的双向添加和删除,它占用的字数与$T$的最大重复的左扩展和右扩展的数量成正比。我们使用这样的全功能索引在小空间中实现双向、频率感知、变阶de Bruijn图,它们的顺序没有上界,并且支持在遍历过程中增加和减少顺序的自然准则。
{"title":"Fully-functional bidirectional Burrows-Wheeler indexes","authors":"F. Cunial, D. Belazzougui","doi":"10.4230/LIPIcs.CPM.2019.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.10","url":null,"abstract":"Given a string $T$ on an alphabet of size $sigma$, we describe a bidirectional Burrows-Wheeler index that takes $O(|T|log{sigma})$ bits of space, and that supports the addition emph{and removal} of one character, on the left or right side of any substring of $T$, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of $T$, but they could support removal only from specific substrings of $T$. We also describe an index that supports bidirectional addition and removal in $O(log{log{|T|}})$ time, and that occupies a number of words proportional to the number of left and right extensions of the maximal repeats of $T$. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs in small space, with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127996224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform 基于Burrows-Wheeler变换的LCP阵列空间高效计算
Pub Date : 2019-01-16 DOI: 10.4230/LIPIcs.CPM.2019.7
N. Prezza, Giovanna Rosone
We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1, {sigma}] can be computed from the Burrows-Wheeler transformed collection in O(n log {sigma}) time using o(n log {sigma}) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. The procedure at the core of our algorithms can be used to enumerate suffix tree intervals in succinct space from the BWT, which is of independent interest. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.
我们证明了总大小为n的字母[1,{sigma}]的文本集合的最长公共前缀数组可以在O(n log {sigma})时间内从Burrows-Wheeler变换集合中计算出来,使用O(n log {sigma})位的工作空间在输入和输出之上。我们的结果改进了(在小字母上)并推广了(在字符串集合上)先前由Beller等人提出的解决方案,后者需要O(n)位的额外工作空间。我们还展示了如何在相同的时间和空间范围内合并总大小为n的两个集合的bwt。我们算法的核心过程可以用来从BWT中枚举简洁空间中的后缀树区间,这是一个独立的兴趣。我们在DNA字母表上的第一个算法的工程实现诱导了一个大型(16 GiB)短(100个碱基)读取集合的LCP,速率为2.92兆碱基/秒,每个碱基在RAM中总共使用1.5字节。我们的第二个算法以每秒1.7兆字节的速率合并两个8 GiB的短读集合的bwt,并且在RAM中每个基使用0.625字节。该算法的扩展还计算合并集合的LCP数组,以每秒1.48兆基的速率处理数据,并在RAM中使用每个基1.625字节。
{"title":"Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform","authors":"N. Prezza, Giovanna Rosone","doi":"10.4230/LIPIcs.CPM.2019.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.7","url":null,"abstract":"We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1, {sigma}] can be computed from the Burrows-Wheeler transformed collection in O(n log {sigma}) time using o(n log {sigma}) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. The procedure at the core of our algorithms can be used to enumerate suffix tree intervals in succinct space from the BWT, which is of independent interest. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131193269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Optimal Rank and Select Queries on Dictionary-Compressed Text 字典压缩文本的最优排序和选择查询
Pub Date : 2018-11-03 DOI: 10.4230/LIPIcs.CPM.2019.4
N. Prezza
We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal $O(log(n/gamma)/loglog n)$ time within $Oleft (gamma rm{polylog} n right)$ space. In this paper, we extend this result to emph{rank} and emph{select} queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a $loglog n$ time-factor in emph{select} queries. We also provide matching lower and upper bounds for emph{partial sum} and emph{predecessor} queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.
我们研究了在以$S$的字符串吸引子的大小$gamma$为界的空间内支持对长度为$n$的字符串$S$的查询的问题。最近的研究表明,在$Oleft (gamma rm{polylog} n right)$空间的最优$O(log(n/gamma)/loglog n)$时间内,可以支持对$S$的随机访问。在本文中,我们将这个结果扩展到emph{排序}和emph{选择}查询,并提供与多对数大小的字母的上界匹配的下界。我们的解决方案以时空折衷的形式给出,这种折衷比以前已知的语法折衷更通用,并且通过在emph{选择}查询中增加$loglog n$时间因子来改进lz77压缩文本的现有边界。我们还为吸引子有界空间中的emph{部分}和和emph{前导}查询提供了匹配的下界和上界,并扩展了下界以包含字典压缩树表示的导航。
{"title":"Optimal Rank and Select Queries on Dictionary-Compressed Text","authors":"N. Prezza","doi":"10.4230/LIPIcs.CPM.2019.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.4","url":null,"abstract":"We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal $O(log(n/gamma)/loglog n)$ time within $Oleft (gamma rm{polylog} n right)$ space. In this paper, we extend this result to emph{rank} and emph{select} queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a $loglog n$ time-factor in emph{select} queries. We also provide matching lower and upper bounds for emph{partial sum} and emph{predecessor} queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126369176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Approximating Approximate Pattern Matching 近似近似模式匹配
Pub Date : 2018-10-01 DOI: 10.4230/LIPIcs.CPM.2019.15
J. Studeny, P. Uznański
Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching problem asks for computation of a particular emph{distance} function between $P$ and every $m$-substring of $T$. We consider a $(1pmvarepsilon)$ multiplicative approximation variant of this problem, for $ell_p$ distance function. In this paper, we describe two $(1+varepsilon)$-approximate algorithms with a runtime of $widetilde{O}(frac{n}{varepsilon})$ for all (constant) non-negative values of $p$. For constant $p ge 1$ we show a deterministic $(1+varepsilon)$-approximation algorithm. Previously, such run time was known only for the case of $ell_1$ distance, by Gawrychowski and Uznanski [ICALP 2018] and only with a randomized algorithm. For constant $0 le p le 1$ we show a randomized algorithm for the $ell_p$, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of $p=0$) and of Gawrychowski and Uznanski for $ell_1$ distance.
给定长度为$n$的文本$T$和长度为$m$的模式$P$,近似模式匹配问题要求计算$P$和$T$的每个$m$ -子串之间的特定emph{距离}函数。对于$ell_p$距离函数,我们考虑这个问题的$(1pmvarepsilon)$乘法近似变体。在本文中,我们描述了两个$(1+varepsilon)$ -近似算法,对$p$的所有(常数)非负值的运行时间为$widetilde{O}(frac{n}{varepsilon})$。对于常数$p ge 1$,我们给出了一个确定性的$(1+varepsilon)$ -近似算法。在此之前,Gawrychowski和Uznanski [ICALP 2018]仅在$ell_1$距离的情况下才知道这样的运行时间,并且只使用随机算法。对于常数$0 le p le 1$,我们展示了$ell_p$的随机算法,从而在Kopelowitz和Porat [FOCS 2015, SOSA 2018]的Hamming距离算法($p=0$的情况)和Gawrychowski和Uznanski的$ell_1$距离算法之间提供了平滑的权衡。
{"title":"Approximating Approximate Pattern Matching","authors":"J. Studeny, P. Uznański","doi":"10.4230/LIPIcs.CPM.2019.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.15","url":null,"abstract":"Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching problem asks for computation of a particular emph{distance} function between $P$ and every $m$-substring of $T$. We consider a $(1pmvarepsilon)$ multiplicative approximation variant of this problem, for $ell_p$ distance function. In this paper, we describe two $(1+varepsilon)$-approximate algorithms with a runtime of $widetilde{O}(frac{n}{varepsilon})$ for all (constant) non-negative values of $p$. For constant $p ge 1$ we show a deterministic $(1+varepsilon)$-approximation algorithm. Previously, such run time was known only for the case of $ell_1$ distance, by Gawrychowski and Uznanski [ICALP 2018] and only with a randomized algorithm. For constant $0 le p le 1$ we show a randomized algorithm for the $ell_p$, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of $p=0$) and of Gawrychowski and Uznanski for $ell_1$ distance.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126321322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Finding a Small Number of Colourful Components 寻找少量彩色组件
Pub Date : 2018-08-10 DOI: 10.4230/LIPICS.CPM.2019.20
L. Bulteau, Konrad K. Dabrowski, G. Fertin, Matthew Johnson, D. Paulusma, Stéphane Vialette
A partition (V_1,...,V_k) of the vertex set of a graph G with a (not necessarily proper) colouring c is colourful if no two vertices in any V_i have the same colour and every set V_i induces a connected graph. The Colourful Partition problem, introduced by Adamaszek and Popa, is to decide whether a coloured graph (G,c) has a colourful partition of size at most k. This problem is related to the Colourful Components problem, introduced by He, Liu and Zhao, which is to decide whether a graph can be modified into a graph whose connected components form a colourful partition by deleting at most p edges. Despite the similarities in their definitions, we show that Colourful Partition and Colourful Components may have different complexities for restricted instances. We tighten known NP-hardness results for both problems by closing a number of complexity gaps. In addition, we prove new hardness and tractability results for Colourful Partition. In particular, we prove that deciding whether a coloured graph (G,c) has a colourful partition of size 2 is NP-complete for coloured planar bipartite graphs of maximum degree 3 and path-width 3, but polynomial-time solvable for coloured graphs of treewidth 2. Rather than performing an ad hoc study, we use our classical complexity results to guide us in undertaking a thorough parameterized study of Colourful Partition. We show that this leads to suitable parameters for obtaining FPT results and moreover prove that Colourful Components and Colourful Partition may have different parameterized complexities, depending on the chosen parameter.
如果在任意V_i中没有两个顶点具有相同的颜色,并且每个集合V_i都诱导出一个连通图,则具有(不一定是固有的)着色c的图G的顶点集的划分(V_1,…,V_k)是彩色的。Adamaszek和Popa提出的彩色分割问题是判断一个彩色图(G,c)是否有一个最大为k的彩色分割。这个问题与He、Liu和Zhao提出的彩色分量问题有关,该问题是判断一个图是否可以通过删除最多p条边来修改成一个连通分量形成彩色分割的图。尽管它们的定义相似,但我们表明彩色分割和彩色组件在受限实例中可能具有不同的复杂性。我们通过缩小一些复杂性差距来收紧已知的np -硬度结果。此外,我们还证明了彩色分割的硬度和可处理性的新结果。特别地,我们证明了判定一个彩色图(G,c)是否具有大小为2的彩色分割对于最大度为3且路径宽度为3的彩色平面二部图是np完全的,而对于树宽度为2的彩色图则是多项式时间可解的。而不是执行一个特别的研究,我们使用我们的经典复杂性结果来指导我们进行一个彻底的参数化研究的彩色分区。我们证明了这导致了获得FPT结果的合适参数,并且证明了根据所选择的参数,彩色组件和彩色分区可能具有不同的参数化复杂性。
{"title":"Finding a Small Number of Colourful Components","authors":"L. Bulteau, Konrad K. Dabrowski, G. Fertin, Matthew Johnson, D. Paulusma, Stéphane Vialette","doi":"10.4230/LIPICS.CPM.2019.20","DOIUrl":"https://doi.org/10.4230/LIPICS.CPM.2019.20","url":null,"abstract":"A partition (V_1,...,V_k) of the vertex set of a graph G with a (not necessarily proper) colouring c is colourful if no two vertices in any V_i have the same colour and every set V_i induces a connected graph. The Colourful Partition problem, introduced by Adamaszek and Popa, is to decide whether a coloured graph (G,c) has a colourful partition of size at most k. This problem is related to the Colourful Components problem, introduced by He, Liu and Zhao, which is to decide whether a graph can be modified into a graph whose connected components form a colourful partition by deleting at most p edges. \u0000Despite the similarities in their definitions, we show that Colourful Partition and Colourful Components may have different complexities for restricted instances. We tighten known NP-hardness results for both problems by closing a number of complexity gaps. In addition, we prove new hardness and tractability results for Colourful Partition. In particular, we prove that deciding whether a coloured graph (G,c) has a colourful partition of size 2 is NP-complete for coloured planar bipartite graphs of maximum degree 3 and path-width 3, but polynomial-time solvable for coloured graphs of treewidth 2. \u0000Rather than performing an ad hoc study, we use our classical complexity results to guide us in undertaking a thorough parameterized study of Colourful Partition. We show that this leads to suitable parameters for obtaining FPT results and moreover prove that Colourful Components and Colourful Partition may have different parameterized complexities, depending on the chosen parameter.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131110597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Superstrings with multiplicities 具有多重的超弦
Pub Date : 2018-07-02 DOI: 10.4230/LIPIcs.CPM.2018.21
Bastien Cazaux, Eric Rivals
A superstring of a set of words P = {s_1, ..., s_p } is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word s_i comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of s_i. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems.
一组单词的超字符串P = {s_1,…s_p}是一个字符串,它包含P的每个单词作为子字符串。给定P,众所周知的最短线性超弦问题(SLS)要求P的最短超弦。在SLS的一个变体中,称为Multi-SLS,每个单词s_i都有一个整数m(i),它的多重性对其出现次数设置了约束,目标是找到一个包含s_i至少m(i)次出现的最短超弦。Multi-SLS是SLS的泛化,显然同样难以解决,但只在特殊情况下(单词长度为2或单词数量固定)对其进行了研究。在一般情况下,Multi-SLS的近似性仍然是开放的。本文研究了Multi-SLS的逼近性,以及要求最短循环覆盖而不是最短超弦的伴问题Multi-SCCS的逼近性。首先,我们研究了一种贪心算法的近似,用于最大化由超弦或循环覆盖提供的压缩:对Multi-SLS的近似比为1/2,对Multi-SCCS的近似比为1。然后,我们展示了一个线性时间近似算法,Concat-Greedy,并表明它在超弦长度方面达到了4的比率。这表明对于这两种度量,Multi-SLS都属于APX问题。
{"title":"Superstrings with multiplicities","authors":"Bastien Cazaux, Eric Rivals","doi":"10.4230/LIPIcs.CPM.2018.21","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.21","url":null,"abstract":"A superstring of a set of words P = {s_1, ..., s_p } is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word s_i comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of s_i. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115399672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure 最大彩色树形问题与颜色层次图结构
Pub Date : 2018-07-02 DOI: 10.4230/LIPIcs.CPM.2018.17
G. Fertin, J. Fradin, Christian Komusiewicz
Let G = (V, A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r. The color hierarchy graph H(G) of G is defined as follows: V (H(G)) is the color set C of G, and H(G) has an arc from c to c if G has an arc from a vertex of color c to a vertex of color c. We study the Maximum Colorful Arborescence (MCA) problem, which takes as input a DAG G such that H(G) is also a DAG, and aims at finding in G a maximum-weight arborescence rooted in r in which no color appears more than once. The MCA problem models the de novo inference of unknown metabolites by mass spectrometry experiments. Although the problem has been introduced ten years ago (under a different name), it was only recently pointed out that a crucial additional property in the problem definition was missing: by essence, H(G) must be a DAG. In this paper, we further investigate MCA under this new light and provide new algorithmic results for this problem, with a specific focus on fixed-parameter tractability (FPT) issues for different structural parameters of H(G). In particular, we show there exists an O(3 ∗ H) time algorithm for solving MCA, where nH is the number of vertices of indegree at least two in H(G), thereby improving the O(3) algorithm from Böcker et al. [Proc. ECCB ’08]. We also prove that MCA is W[2]-hard relatively to the treewidth Ht of the underlying undirected graph of H(G), and further show that it is FPT relatively to Ht + lC , where lC := |V | − |C|. 2012 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems, G.2.1 Combinatorics, G.2.2 Graph Theory
设G = (V, A)是根于某顶点r的顶点着色的弧加权有向无环图(DAG), G的颜色层次图H(G)定义如下:V (H(G))是G的颜色集C,如果G有一个从颜色C的顶点到颜色C的顶点的弧,则H(G)有一个从C到C的弧。我们研究了最大彩色树形(MCA)问题,该问题以一个DAG G作为输入,使得H(G)也是一个DAG,目的是在G中找到一个植根于r的最大权值树形,其中没有颜色出现超过一次。MCA问题通过质谱实验模拟未知代谢物的从头推断。虽然这个问题早在十年前就被提出了(换了一个名字),但直到最近才有人指出,这个问题定义中缺少了一个重要的附加性质:本质上,H(G)必须是DAG。在本文中,我们在这一新的视角下进一步研究了MCA,并为该问题提供了新的算法结果,特别关注了H(G)不同结构参数下的固定参数可跟踪性(FPT)问题。特别地,我们证明存在一个求解MCA的O(3 * H)时间算法,其中nH是H(G)中至少两次的顶点数,从而改进了Böcker等人[Proc. ECCB ' 08]的O(3)算法。我们还证明了MCA相对于底层无向图H(G)的树宽Ht是W[2]-hard,并进一步证明了MCA相对于Ht + lC是FPT,其中lC:= |V |−|C|。2012 ACM学科分类F.2.2非数值算法与问题,G.2.1组合学,G.2.2图论
{"title":"On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure","authors":"G. Fertin, J. Fradin, Christian Komusiewicz","doi":"10.4230/LIPIcs.CPM.2018.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.17","url":null,"abstract":"Let G = (V, A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r. The color hierarchy graph H(G) of G is defined as follows: V (H(G)) is the color set C of G, and H(G) has an arc from c to c if G has an arc from a vertex of color c to a vertex of color c. We study the Maximum Colorful Arborescence (MCA) problem, which takes as input a DAG G such that H(G) is also a DAG, and aims at finding in G a maximum-weight arborescence rooted in r in which no color appears more than once. The MCA problem models the de novo inference of unknown metabolites by mass spectrometry experiments. Although the problem has been introduced ten years ago (under a different name), it was only recently pointed out that a crucial additional property in the problem definition was missing: by essence, H(G) must be a DAG. In this paper, we further investigate MCA under this new light and provide new algorithmic results for this problem, with a specific focus on fixed-parameter tractability (FPT) issues for different structural parameters of H(G). In particular, we show there exists an O(3 ∗ H) time algorithm for solving MCA, where nH is the number of vertices of indegree at least two in H(G), thereby improving the O(3) algorithm from Böcker et al. [Proc. ECCB ’08]. We also prove that MCA is W[2]-hard relatively to the treewidth Ht of the underlying undirected graph of H(G), and further show that it is FPT relatively to Ht + lC , where lC := |V | − |C|. 2012 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems, G.2.1 Combinatorics, G.2.2 Graph Theory","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"4 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121008188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Lyndon Factorization of Grammar Compressed Texts Revisited 林登语法分解压缩文本重访
Pub Date : 2018-05-01 DOI: 10.4230/LIPIcs.CPM.2018.24
Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda
We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.
我们重新计算的林登分解的问题给出一个字符串的长度N w的直线程序(SLP)大小N。对于这个问题,我们展示了一个新的算法运行在O (P (N, N) + N (N, N)日志O (log N))时间和O (N log N + S (N, N))空间,P (N, N), (N, N)、问(N, N)分别预处理时间,空间,和查询时间最长公共数据结构的扩展(特性)得到。我们的算法改进了I等人(TCS '17)提出的算法,并且在w是高度可压缩的情况下,可以比Duval (J. Algorithms '83)的O(N)时间解更有效。
{"title":"Lyndon Factorization of Grammar Compressed Texts Revisited","authors":"Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.24","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.24","url":null,"abstract":"We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"47 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114003091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Computing longest common square subsequences 计算最长公方子序列
Pub Date : 2018-05-01 DOI: 10.4230/LIPIcs.CPM.2018.15
Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, H. Bannai, M. Takeda
A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.
正方形是形式为YY的非空字符串。广场最长公共子序列(LCSqS)问题是计算最长广场发生作为子序列在a和b两个给定的字符串,我们表明,这个问题可以很容易地解决O (n ^ 6)时间或O (n ^ | |米4)时间与O (n ^ 4)空间,其中n是字符串的长度和M a和b之间的匹配点集,我们表明,这个问题也可以解决在O(σ| | ^ 3 + n)时间和O (M | | ^ 2 + n)空间,或在O (M | | ^ 3日志日志log n ^ 2 n + n)时间与O (M | | ^ 3 + n)的空间,其中sigma是A和b中出现的不同字符的数量。我们还研究了两个或多个字符串的LCSqS问题的下界。
{"title":"Computing longest common square subsequences","authors":"Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.15","url":null,"abstract":"A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127553686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Longest Lyndon Substring After Edit 编辑后最长的林登子串
Pub Date : 2018-05-01 DOI: 10.4230/LIPIcs.CPM.2018.19
Y. Urabe, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda
The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.
字符串T的最长的林登子串是T的最长的子串它是一个林登词。LLS(T)表示字符串T的最长Lyndon子串的长度。本文考虑计算LLS(T'),其中T'是由T组成的编辑字符串。经过O(n)时间和空间预处理后,我们的算法对任意单个字符编辑在O(log n)时间内返回LLS(T')。我们还考虑了一个具有块编辑的问题版本,即T的子串被给定长度为l的字符串替换。经过O(n)时间和空间预处理后,对于任何块编辑,我们的算法在O(l log sigma + log n)时间内返回LLS(T'),其中sigma是T中不同字符的数量。我们可以修改我们的算法,以便为两个问题输出T'的所有最长的Lyndon子串。
{"title":"Longest Lyndon Substring After Edit","authors":"Y. Urabe, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.19","url":null,"abstract":"The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125657054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Annual Symposium on Combinatorial Pattern Matching
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1