Annual Symposium on Combinatorial Pattern Matching最新文献

英文中文

Parameterized Algorithms for Matrix Completion With Radius Constraints 半径约束下矩阵补全的参数化算法

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2020-02-03 DOI: 10.4230/LIPIcs.CPM.2020.20

Tomohiro Koana, Vincent Froese, R. Niedermeier

Considering matrices with missing entries, we study NP-hard matrix completion problems where the resulting completed matrix shall have limited (local) radius. In the pure radius version, this means that the goal is to fill in the entries such that there exists a 'center string' which has Hamming distance to all matrix rows as small as possible. In stringology, this problem is also known as Closest String with Wildcards. In the local radius version, the requested center string must be one of the rows of the completed matrix. Hermelin and Rozenberg [CPM 2014, TCS 2016] performed parameterized complexity studies for Closest String with Wildcards. We answer one of their open questions, fix a bug concerning a fixed-parameter tractability result in their work, and improve some upper running time bounds. For the local radius case, we reveal a computational complexity dichotomy. In general, our results indicate that, although being NP-hard as well, this variant often allows for faster (fixed-parameter) algorithms.

考虑缺少条目的矩阵，我们研究了NP-hard矩阵补全问题，其中得到的补全矩阵必须有有限的(局部)半径。在纯半径版本中，这意味着目标是填充条目，使得存在一个“中心字符串”，其与所有矩阵行的汉明距离尽可能小。在字符串学中，这个问题也被称为带通配符的最接近字符串。在本地半径版本中，请求的中心字符串必须是已完成矩阵的其中一行。Hermelin和Rozenberg [CPM 2014, TCS 2016]对带通配符的最接近字符串进行了参数化复杂性研究。我们回答了他们的一个开放问题，修复了他们工作中关于固定参数可跟踪性结果的错误，并改进了一些运行时间上限。对于局部半径情况，我们揭示了一种计算复杂度二分法。一般来说，我们的结果表明，尽管也是NP-hard，但这种变体通常允许更快的(固定参数)算法。

引用次数: 9

Chaining with overlaps revisited 重复链接

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2020-01-19 DOI: 10.4230/LIPIcs.CPM.2020.25

V. Mäkinen, Kristoffer Sahlin

Chaining algorithms aim to form a semi-global alignment of two sequences based on a set of anchoring local alignments as input. Depending on the optimization criteria and the exact definition of a chain, there are several $O(n log n)$ time algorithms to solve this problem optimally, where $n$ is the number of input anchors. In this paper, we focus on a formulation allowing the anchors to overlap in a chain. This formulation was studied by Shibuya and Kurochin (WABI 2003), but their algorithm comes with no proof of correctness. We revisit and modify their algorithm to consider a strict definition of precedence relation on anchors, adding the required derivation to convince on the correctness of the resulting algorithm that runs in $O(n log^2 n)$ time on anchors formed by exact matches. With the more relaxed definition of precedence relation considered by Shibuya and Kurochin or when anchors are non-nested such as matches of uniform length ($k$-mers), the algorithm takes $O(n log n)$ time. We also establish a connection between chaining with overlaps to the widely studied longest common subsequence (LCS) problem.

链接算法的目标是基于一组锚定的局部对齐作为输入，形成两个序列的半全局对齐。根据优化标准和链的确切定义，有几个$O(n log n)$ time算法可以最优地解决这个问题，其中$n$是输入锚点的数量。在本文中，我们重点研究了一种允许锚点在链中重叠的配方。Shibuya和Kurochin (wai 2003)研究了这个公式，但他们的算法没有证明其正确性。我们重新审视并修改了他们的算法，以考虑锚点优先关系的严格定义，并添加了必要的推导，以确保结果算法的正确性，该算法在精确匹配形成的锚点上运行的时间为$O(n log^2 n)$。Shibuya和Kurochin考虑了更宽松的优先关系定义，或者当锚点是非嵌套的，例如均匀长度($k$-mers)的匹配时，该算法需要$O(n log n)$时间。我们还建立了链接与广泛研究的最长公共子序列(LCS)问题之间的联系。

引用次数: 12

Maximal Common Subsequence Algorithms 最大公共子序列算法

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-11-01 DOI: 10.4230/LIPIcs.CPM.2018.1

Y. Sakai

A common subsequence of two strings is maximal, if inserting any character into the subsequence can no longer yield a common subsequence of the two strings. The present article proposes a (sub)linearithmic-time, linear-space algorithm for finding a maximal common subsequence of two strings and also proposes a linear-time algorithm for determining if a common subsequence of two strings is maximal.

如果在子序列中插入任何字符都不能产生两个字符串的公共子序列，则两个字符串的公共子序列是最大的。本文提出了一种线性时间、线性空间算法来寻找两个字符串的最大公共子序列，并提出了一种线性时间算法来确定两个字符串的公共子序列是否最大。

引用次数: 8

On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations 关于重叠Lempel-Ziv和Lyndon分解的大小

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-06-01 DOI: 10.4230/LIPIcs.CPM.2019.29

Y. Urabe, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda

Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Karkkainen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.

Lempel-Ziv (LZ)分解和Lyndon分解是众所周知的字符串分解。最近，Karkkainen等人研究了两种分解的大小之间的关系，并表明Lyndon分解的大小总是小于非重叠LZ分解的大小的两倍[STACS 2017]。在本文中，我们考虑了LZ分解的重叠版本的类似问题。由于重叠LZ分解的大小总是小于非重叠LZ分解的大小，事实上，甚至可以小于O(log n)个因子，因此不能立即清楚是否存在与先前工作相似的界。然而，在本文中，我们证明了Lyndon分解的大小总是小于重叠LZ分解大小的四倍。

引用次数: 9

Hamming Distance Completeness 汉明距离完整性

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-06-01 DOI: 10.4230/LIPIcs.CPM.2019.14

K. Labib, P. Uznański, D. Wolleb-Graf

We show, given a binary integer function diamond that is piecewise polynomial, that (+,diamond) vector products are equivalent under one-to-polylog reductions to the computation of the Hamming distance. Examples include the dominance and l_{2p+1} distances for constant p. Our results imply equivalence (up to polylog factors) between the complexity of computing All Pairs Hamming Distance, All Pairs l_{2p+1} Distance and Dominance Matrix Product, and equivalence between Hamming Distance Pattern Matching, l_{2p+1} Pattern Matching and Less-Than Pattern Matching. The resulting algorithms for l_{2p+1} Pattern Matching and All Pairs l_{2p+1}, for 2p+1 = 3,5,7,... are likely to be optimal, given lack of progress in improving upper bounds for Hamming distance in the past 30 years. While reductions between selected pairs of products were presented in the past, our work is the first to generalize them to a general class of functions, showing that a wide class of "intermediate" complexity problems are in fact equivalent.

我们证明，给定一个二进制整数函数 diamond 是片断多项式函数，(+,diamond) 向量乘积与汉明距离的计算是等价的，而且是一对多对数的还原。我们的结果意味着计算全对汉明距离、全对 l_{2p+1} 距离和支配矩阵积的复杂度之间是等价的（最多为多对数因子）。距离和支配矩阵乘积的复杂度之间是等价的（最多为多对数），而汉明距离模式匹配、l_{2p+1}模式匹配和小于模式匹配之间的等价性。由此产生的 l_{2p+1} 算法和 All Pairs模式匹配和所有配对 l_{2p+1}（2p+1 = 3,5,7,...）的算法很可能是最优的，因为在过去 30 年中，在改进汉明距离上限方面缺乏进展。虽然过去曾提出过选定乘积对之间的还原，但我们的工作是首次将它们推广到一般函数类别中，证明了一大类 "中间 "复杂度问题实际上是等价的。

引用次数: 14

Indexing the Bijective BWT 索引双目标BWT

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-06-01 DOI: 10.4230/LIPIcs.CPM.2019.17

H. Bannai, Juha Kärkkäinen, D. Köppl, Marcin Piatkowski

The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT is a bijective variant of it that has not yet been studied for text indexing applications. We fill this gap by proposing a self-index built on the bijective BWT . The self-index applies the backward search technique of the FM-index to find a pattern P with O(|P| lg|P|) backward search steps.

Burrows-Wheeler变换(BWT)是一种排列，在数据压缩和文本索引中应用广泛。双射BWT是它的一个双射变体，在文本索引应用中尚未被研究。我们通过提出建立在双目标BWT上的自索引来填补这一空白。自索引采用fm索引的反向搜索技术，用O(|P| lg|P|)个反向搜索步骤找到模式P。

引用次数: 8

An Invertible Transform for Efficient String Matching in Labeled Digraphs 标记有向图中高效字符串匹配的可逆变换

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-05-09 DOI: 10.4230/LIPIcs.CPM.2021.20

Abhinav Nellore, Austin Nguyen, Reid F. Thompson

Let $G = (V, E)$ be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet $Omega$, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of $G$ into a weakly connected digraph $G' = (V', E')$ that enables solving the decision problem of whether there is a walk in $G$ matching an arbitrarily long query string $q$ in time linear in $|q|$ and independent of $|E|$ and $|V|$. We show $G$ is uniquely determined by $G'$ when for every $v_ell in V$, there is some distinct string $s_ell$ on $Omega$ such that $v_ell$ is the origin of a closed walk in $G$ matching $s_ell$, and no other walk in $G$ matches $s_ell$ unless it starts and ends at $v_ell$. We then exploit this invertibility condition to strategically alter any $G$ so its transform $G'$ enables retrieval of all $t$ terminal vertices of walks in the unaltered $G$ matching $q$ in $O(|q| + t log |V|)$ time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.

设$G = (V, E)$是一个有向图，其中每个顶点都没有标记，每个边都用某种字母中的字符标记$Omega$，并且具有相同头和相同尾的任何两条边都具有不同的标签。powerset构造给出了将$G$转换为弱连接有向图$G' = (V', E')$的方法，该方法能够解决在$|q|$中是否存在与任意长的查询字符串$q$在时间线性上匹配且独立于$|E|$和$|V|$的$G$中的行走决策问题。我们证明$G$是由$G'$唯一确定的，当对于每一个$v_ell in V$，在$Omega$上有一些独特的字符串$s_ell$，使得$v_ell$是$G$中匹配$s_ell$的封闭行走的起源，并且$G$中没有匹配$s_ell$的其他行走，除非它开始和结束于$v_ell$。然后，我们利用这种可逆性条件策略性地改变任何$G$，使其变换$G'$能够在$O(|q| + t log |V|)$时间内检索未改变的$G$匹配$q$中行走的所有$t$终端顶点。最后，我们提出了一类变换的两个定义性质，这类变换包括Burrows-Wheeler变换和这里给出的变换。

引用次数: 3

Conversion from RLBWT to LZ77 RLBWT到LZ77的转换

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-02-14 DOI: 10.4230/LIPIcs.CPM.2019.9

T. Nishimoto, Yasuo Tabei

Converting a compressed format of a string into another compressed format without an explicit decompression is one of the central research topics in string processing. We discuss the problem of converting the run-length Burrows-Wheeler Transform (RLBWT) of a string to Lempel-Ziv 77 (LZ77) phrases of the reversed string. The first results with Policriti and Prezza's conversion algorithm [Algorithmica 2018] were $O(n log r)$ time and $O(r)$ working space for length of the string $n$, number of runs $r$ in the RLBWT, and number of LZ77 phrases $z$. Recent results with Kempa's conversion algorithm [SODA 2019] are $O(n / log n + r log^{9} n + z log^{9} n)$ time and $O(n / log_{sigma} n + r log^{8} n)$ working space for the alphabet size $sigma$ of the RLBWT. In this paper, we present a new conversion algorithm by improving Policriti and Prezza's conversion algorithm where dynamic data structures for general purpose are used. We argue that these dynamic data structures can be replaced and present new data structures for faster conversion. The time and working space of our conversion algorithm with new data structures are $O(n min { log log n, sqrt{frac{log r}{loglog r}} })$ and $O(r)$, respectively.

在不显式解压缩的情况下将字符串的压缩格式转换为另一种压缩格式是字符串处理的中心研究课题之一。讨论了将字符串的行长Burrows-Wheeler变换(RLBWT)转换为反向字符串的Lempel-Ziv 77 (LZ77)短语的问题。politici和Prezza的转换算法[Algorithmica 2018]的第一个结果是$O(n log r)$时间和$O(r)$字符串长度的工作空间$n$、RLBWT中的运行次数$r$和LZ77短语的数量$z$。最近使用Kempa的转换算法[SODA 2019]的结果是$O(n / log n + r log^{9} n + z log^{9} n)$时间和$O(n / log_{sigma} n + r log^{8} n)$ RLBWT的字母表大小工作空间$sigma$。在本文中，我们通过改进Policriti和Prezza的转换算法提出了一种新的转换算法，其中使用了通用的动态数据结构。我们认为这些动态数据结构可以被替换并呈现新的数据结构，以实现更快的转换。新数据结构转换算法的时间和工作空间分别为$O(n min { log log n, sqrt{frac{log r}{loglog r}} })$和$O(r)$。

{"title":"Conversion from RLBWT to LZ77","authors":"T. Nishimoto, Yasuo Tabei","doi":"10.4230/LIPIcs.CPM.2019.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.9","url":null,"abstract":"Converting a compressed format of a string into another compressed format without an explicit decompression is one of the central research topics in string processing. We discuss the problem of converting the run-length Burrows-Wheeler Transform (RLBWT) of a string to Lempel-Ziv 77 (LZ77) phrases of the reversed string. The first results with Policriti and Prezza's conversion algorithm [Algorithmica 2018] were $O(n log r)$ time and $O(r)$ working space for length of the string $n$, number of runs $r$ in the RLBWT, and number of LZ77 phrases $z$. Recent results with Kempa's conversion algorithm [SODA 2019] are $O(n / log n + r log^{9} n + z log^{9} n)$ time and $O(n / log_{sigma} n + r log^{8} n)$ working space for the alphabet size $sigma$ of the RLBWT. In this paper, we present a new conversion algorithm by improving Policriti and Prezza's conversion algorithm where dynamic data structures for general purpose are used. We argue that these dynamic data structures can be replaced and present new data structures for faster conversion. The time and working space of our conversion algorithm with new data structures are $O(n min { log log n, sqrt{frac{log r}{loglog r}} })$ and $O(r)$, respectively.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121668881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Faster queries for longest substring palindrome after block edit 块编辑后最长子字符串回文的查询速度更快

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-01-30 DOI: 10.4230/LIPICS.CPM.2019.27

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda

Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log log n) time, after a substring in T is replaced by a string of arbitrary length l. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(l + log n) time for each query.

回文是字符串中的重要对象，从组合学、算法和生物信息学的角度对其进行了广泛的研究。Manacher [J。ACM 1975]提出了一种开创性的算法，该算法在O(n)时间内计算给定字符串的最长子串回文(LSPals)，其中n是字符串的长度。在本文中，我们考虑了字符串被编辑后查找LSPal的问题。我们提出了一种算法，该算法使用O(n)时间和空间进行预处理，并在T中的子字符串被任意长度为l的字符串替换后，在O(l + log log n)时间内回答LSPals的长度。这优于我们之前的工作[CPM 2018]中提出的查询算法，该算法每次查询使用O(l + log n)时间。

引用次数: 7

Computing runs on a trie 计算在尝试中运行

Annual Symposium on Combinatorial Pattern Matching

Pub Date : 2019-01-30 DOI: 10.4230/LIPIcs.CPM.2019.23

Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda

A maximal repetition, or run, in a string, is a periodically maximal substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with $n$ edges, we show that the number of runs is less than $n$. We also show an $O(nsqrt{log n}log log n)$ time and $O(n)$ space algorithm for counting and finding the shallower endpoint of all runs. We further show an $O(nsqrt{log n}log^2log n)$ time and $O(n)$ space algorithm for finding both endpoints of all runs.

字符串中的最大重复或运行是周期性最大的子字符串，其最小周期最多为子字符串长度的一半。在本文中，我们考虑与树上的路径对应的运行，或者换句话说，在有根的边缘标记的树上，路径的端点必须是另一个的后代/祖先。对于一条边为$n$的树，我们证明运行次数小于$n$。我们还展示了一个$O(nsqrt{log n}log log n)$时间和$O(n)$空间算法，用于计数和查找所有运行的较浅端点。我们进一步展示了用于查找所有运行的两个端点的$O(nsqrt{log n}log^2log n)$时间和$O(n)$空间算法。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Annual Symposium on Combinatorial Pattern Matching

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀