Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, Elia C. Zirondelli
Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. While such paths constitute only partial assemblies, they are likely to be correct. More precisely, if one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. Until recently, it was open what are all the safe walks of an assembly graph. Tomescu and Medvedev (RECOMB 2016) characterized all such safe walks (omnitigs), thus giving the first safe and complete genome assembly algorithm. Even though maximal omnitig finding was later improved to quadratic time by Cairo et al. (ACM Trans. Algorithms 2019), it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We answer this question affirmatively, by describing a surprising O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig. This has two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact O(m) representation of all maximal omnitigs, which allows, e.g., for O(m)-time computation of various statistics on them. Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.
{"title":"Genome assembly, from practice to theory: safe, complete and <i>linear-time</i>","authors":"Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, Elia C. Zirondelli","doi":"10.1145/3632176","DOIUrl":"https://doi.org/10.1145/3632176","url":null,"abstract":"Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. While such paths constitute only partial assemblies, they are likely to be correct. More precisely, if one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. Until recently, it was open what are all the safe walks of an assembly graph. Tomescu and Medvedev (RECOMB 2016) characterized all such safe walks (omnitigs), thus giving the first safe and complete genome assembly algorithm. Even though maximal omnitig finding was later improved to quadratic time by Cairo et al. (ACM Trans. Algorithms 2019), it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We answer this question affirmatively, by describing a surprising O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig. This has two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact O(m) representation of all maximal omnitigs, which allows, e.g., for O(m)-time computation of various statistics on them. Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses S is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform S into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses S and a positive integer k , and the goal is to compute the Dyck edit distance of S only if the distance is at most k , and otherwise report that the distance is larger than k . Backurs and Onak [PODS’16] showed that the threshold Dyck edit distance problem can be solved in O ( n + k 16 ) time. In this work, we design new algorithms for the threshold Dyck edit distance problem which costs O ( n + k 4.544184 ) time with high probability or O ( n + k 4.853059 ) deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast (min , +) matrix product, and a careful modification of ideas used in Valiant’s parsing algorithm.
戴克序列是一个平衡的(各种类型的)开括号和闭括号序列。给定括号序列S的Dyck编辑距离是将S转换为Dyck序列所需的最小编辑操作(插入、删除和替换)数量。我们考虑阈值Dyck编辑距离问题,其中输入是括号S和正整数k的序列,目标是仅当距离不大于k时计算S的Dyck编辑距离,否则报告距离大于k。Backurs和Onak [PODS ' 16]表明阈值Dyck编辑距离问题可以在O (n + k 16)时间内解决。在这项工作中,我们为阈值Dyck编辑距离问题设计了新的算法,该问题的高概率时间为O (n + k 4.544184),确定性时间为O (n + k 4.853059)。我们的算法结合了Dyck编辑距离问题的几个新的结构属性,一个快速(min, +)矩阵乘积的改进算法,以及对Valiant解析算法中使用的思想的仔细修改。
{"title":"An Improved Algorithm for The <i>k</i> -Dyck Edit Distance Problem","authors":"Dvir Fried, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, Tatiana Starikovskaya","doi":"10.1145/3627539","DOIUrl":"https://doi.org/10.1145/3627539","url":null,"abstract":"A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses S is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform S into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses S and a positive integer k , and the goal is to compute the Dyck edit distance of S only if the distance is at most k , and otherwise report that the distance is larger than k . Backurs and Onak [PODS’16] showed that the threshold Dyck edit distance problem can be solved in O ( n + k 16 ) time. In this work, we design new algorithms for the threshold Dyck edit distance problem which costs O ( n + k 4.544184 ) time with high probability or O ( n + k 4.853059 ) deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast (min , +) matrix product, and a careful modification of ideas used in Valiant’s parsing algorithm.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135666559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given a graph G = ( V , E ) and an integer k , the Cluster Editing problem asks whether we can transform G into a union of vertex-disjoint cliques by at most k modifications (edge deletions or insertions). In this paper, we study the following variant of Cluster Editing . We are given a graph G = ( V , E ), a packing (mathcal {H} ) of modification-disjoint induced P 3 s (no pair of P 3 s in (mathcal {H} ) share an edge or non-edge) and an integer ℓ. The task is to decide whether G can be transformed into a union of vertex-disjoint cliques by at most (ell +|mathcal {H}| ) modifications (edge deletions or insertions). We show that this problem is NP-hard even when ℓ = 0 (in which case the problem asks to turn G into a disjoint union of cliques by performing exactly one edge deletion or insertion per element of (mathcal {H} ) ) and when each vertex is in at most 23 P 3 s of the packing. This answers negatively a question of van Bevern, Froese, and Komusiewicz (CSR 2016, ToCS 2018), repeated by C. Komusiewicz at Shonan meeting no. 144 in March 2019. We then initiate the study to find the largest integer c such that the problem remains tractable when restricting to packings such that each vertex is in at most c packed P 3 s. Here packed P 3 s are those belonging to the packing (mathcal {H} ) . Van Bevern et al. showed that the case c = 1 is fixed-parameter tractable with respect to ℓ and we show that the case c = 2 is solvable in | V | 2ℓ + O (1) time.
{"title":"Cluster Editing parameterized above modification-disjoint <i>P</i> <sub>3</sub> -packings","authors":"Shaohua Li, Marcin Pilipczuk, Manuel Sorge","doi":"10.1145/3626526","DOIUrl":"https://doi.org/10.1145/3626526","url":null,"abstract":"Given a graph G = ( V , E ) and an integer k , the Cluster Editing problem asks whether we can transform G into a union of vertex-disjoint cliques by at most k modifications (edge deletions or insertions). In this paper, we study the following variant of Cluster Editing . We are given a graph G = ( V , E ), a packing (mathcal {H} ) of modification-disjoint induced P 3 s (no pair of P 3 s in (mathcal {H} ) share an edge or non-edge) and an integer ℓ. The task is to decide whether G can be transformed into a union of vertex-disjoint cliques by at most (ell +|mathcal {H}| ) modifications (edge deletions or insertions). We show that this problem is NP-hard even when ℓ = 0 (in which case the problem asks to turn G into a disjoint union of cliques by performing exactly one edge deletion or insertion per element of (mathcal {H} ) ) and when each vertex is in at most 23 P 3 s of the packing. This answers negatively a question of van Bevern, Froese, and Komusiewicz (CSR 2016, ToCS 2018), repeated by C. Komusiewicz at Shonan meeting no. 144 in March 2019. We then initiate the study to find the largest integer c such that the problem remains tractable when restricting to packings such that each vertex is in at most c packed P 3 s. Here packed P 3 s are those belonging to the packing (mathcal {H} ) . Van Bevern et al. showed that the case c = 1 is fixed-parameter tractable with respect to ℓ and we show that the case c = 2 is solvable in | V | 2ℓ + O (1) time.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136211408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag
Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across k different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening – which are used in a multilevel scheme with log ( n ) levels. As additional components, we parallelize the n -level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x (log ( n )-level), and 25.9x ( n -level).
{"title":"Scalable High-Quality Hypergraph Partitioning","authors":"Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag","doi":"10.1145/3626527","DOIUrl":"https://doi.org/10.1145/3626527","url":null,"abstract":"Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across k different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening – which are used in a multilevel scheme with log ( n ) levels. As additional components, we parallelize the n -level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x (log ( n )-level), and 25.9x ( n -level).","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the energy complexity of the leader election problem in the single-hop radio network model, where each device v has a unique identifier ID ( v ) ∈{ 1, 2, ⋖ , N } . Energy is a scarce resource for small battery-powered devices. For such devices, most of the energy is often spent on communication, not on computation. To approximate the actual energy cost, the energy complexity of an algorithm is defined as the maximum over all devices of the number of time slots where the device transmits or listens. Much progress has been made in understanding the energy complexity of leader election in radio networks, but very little is known about the tradeoff between time and energy. Chang et al. [STOC 2017] showed that the optimal deterministic energy complexity of leader election is Θ (log log N ) if each device can simultaneously transmit and listen but still leaving the problem of determining the optimal time complexity under any given energy constraint. Time–energy tradeoff: For any k ≥ log log N , we show that a leader among at most n devices can be elected deterministically in O ( k ċ n 1+ε ) + O ( k ċ N 1/k ) time and O ( k ) energy if each device can simultaneously transmit and listen, where ε > 0 is any small constant. This improves upon the previous O ( N )-time O (log log N )-energy algorithm by Chang et al. [STOC 2017]. We provide lower bounds to show that the time–energy tradeoff of our algorithm is near-optimal. Dense instances: For the dense instances where the number of devices is n = Θ ( N ), we design a deterministic leader election algorithm using only O (1) energy. This improves upon the O (log* N )-energy algorithm by Jurdziński, Kutyłowski, and Zatopiański [PODC 2002] and the O (α ( N ))-energy algorithm by Chang et al. [STOC 2017]. More specifically, we show that the optimal deterministic energy complexity of leader election is (Theta (max lbrace 1, log tfrac{N}{n}rbrace)) if each device cannot simultaneously transmit and listen, and it is (Θ (max lbrace 1, log log tfrac{N}{n}rbrace)) if each device can simultaneously transmit and listen.
我们考虑单跳无线网络模型中领导者选举问题的能量复杂度,其中每个设备v有一个唯一标识符ID (v)∈{1,2,⋖,N}。对于小型电池供电设备来说,能源是一种稀缺资源。对于这样的设备,大部分能量通常花在通信上,而不是计算上。为了接近实际的能量消耗,算法的能量复杂度定义为设备发送或收听的时隙数量在所有设备中的最大值。在了解无线网络领导人选举的能量复杂性方面已经取得了很大进展,但对时间和能量之间的权衡知之甚少。Chang等人[STOC 2017]表明,如果每个设备可以同时发送和侦听,但仍然留下在任何给定能量约束下确定最优时间复杂度的问题,则领导者选举的最优确定性能量复杂度为Θ (log log N)。时间-能量权衡:对于任意k≥log log N,我们证明了在最多N个设备中,如果每个设备可以同时传输和侦听,则可以在O (k * N 1+ε) + O (k * N 1/k)时间和O (k * N 1/k)能量中确定性地选出一个领导者,其中ε &gt;0是任意小的常数。这改进了Chang等人之前的O (N)- O (log log N)-能量算法[STOC 2017]。我们提供了下界,以表明我们的算法的时间-能量权衡接近最优。密集实例:对于设备数量为n = Θ (n)的密集实例,我们设计了一个仅使用O(1)能量的确定性领导者选举算法。该算法改进了Jurdziński、Kutyłowski和Zatopiański的O (log* N)-能量算法[PODC 2002]和Chang等人的O (α (N))-能量算法[STOC 2017]。更具体地说,我们表明,当每个设备不能同时传输和侦听时,领导者选举的最优确定性能量复杂度为(Theta (max lbrace 1, log tfrac{N}{n}rbrace)),当每个设备同时传输和侦听时,其最优确定性能量复杂度为(Θ (max lbrace 1, log log tfrac{N}{n}rbrace))。
{"title":"Near-Optimal Time-Energy Trade-Offs for Deterministic Leader Election","authors":"Yi-Jun Chang, Ran Duan, Shunhua Jiang","doi":"10.1145/3614429","DOIUrl":"https://doi.org/10.1145/3614429","url":null,"abstract":"We consider the energy complexity of the leader election problem in the single-hop radio network model, where each device v has a unique identifier ID ( v ) ∈{ 1, 2, ⋖ , N } . Energy is a scarce resource for small battery-powered devices. For such devices, most of the energy is often spent on communication, not on computation. To approximate the actual energy cost, the energy complexity of an algorithm is defined as the maximum over all devices of the number of time slots where the device transmits or listens. Much progress has been made in understanding the energy complexity of leader election in radio networks, but very little is known about the tradeoff between time and energy. Chang et al. [STOC 2017] showed that the optimal deterministic energy complexity of leader election is Θ (log log N ) if each device can simultaneously transmit and listen but still leaving the problem of determining the optimal time complexity under any given energy constraint. Time–energy tradeoff: For any k ≥ log log N , we show that a leader among at most n devices can be elected deterministically in O ( k ċ n 1+ε ) + O ( k ċ N 1/k ) time and O ( k ) energy if each device can simultaneously transmit and listen, where ε > 0 is any small constant. This improves upon the previous O ( N )-time O (log log N )-energy algorithm by Chang et al. [STOC 2017]. We provide lower bounds to show that the time–energy tradeoff of our algorithm is near-optimal. Dense instances: For the dense instances where the number of devices is n = Θ ( N ), we design a deterministic leader election algorithm using only O (1) energy. This improves upon the O (log* N )-energy algorithm by Jurdziński, Kutyłowski, and Zatopiański [PODC 2002] and the O (α ( N ))-energy algorithm by Chang et al. [STOC 2017]. More specifically, we show that the optimal deterministic energy complexity of leader election is (Theta (max lbrace 1, log tfrac{N}{n}rbrace)) if each device cannot simultaneously transmit and listen, and it is (&#x0398; (max lbrace 1, log log tfrac{N}{n}rbrace)) if each device can simultaneously transmit and listen.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134904251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Let G be a directed multi-graph on n vertices and m edges with a designated source vertex s and a designated sink vertex t. We study the (s, t)-cuts of capacity minimum+1 and as an important application of them, we give a solution to the dual edge sensitivity for (s, t)-mincuts – reporting an (s, t)-mincut upon failure or insertion of any pair of edges. Picard and Queyranne [Mathematical Programming Studies, 13(1):8-16, 1980] showed that there exists a directed acyclic graph (DAG) that compactly stores all minimum (s, t)-cuts of G. This structure also acts as an oracle for the single edge sensitivity of minimum (s, t)-cut. For undirected multi-graphs, Dinitz and Nutov [STOC, pages 509-518, 1995] showed that there exists an ({mathcal {O}}(n) ) size 2-level cactus model that stores all global cuts of capacity minimum+1. However, for minimum+1 (s, t)-cuts, no such compact structure exists till date. We present the following structural and algorithmic results on minimum+1 (s, t)-cuts. (1) Structure: There is an ({mathcal {O}}(m) ) size 2-level DAG structure that stores all minimum+1 (s, t)-cuts of G such that each minimum+1 (s, t)-cut appears as 3-transversal cut – it intersects any path in this structure at most thrice. We also show that there is an ({mathcal {O}}(mn) ) size structure for storing and characterizing all minimum+1 (s, t)-cuts in terms of 1-transversal cuts. (2) Data structure: There exists an ({mathcal {O}}(n^2) ) size data structure that, given a pair of vertices {u, v} which are not separated by an (s, t)-mincut, can determine in ({mathcal {O}}(1) ) time if there exists a minimum+1 (s, t)-cut, say (A, B), such that s, u ∈ A and v, t ∈ B; the corresponding cut can be reported in ({mathcal {O}}(|B|) ) time.(3) Sensitivity oracle: There exists an ({mathcal {O}}(n^2) ) size data structure that solves the dual edge sensitivity problem for (s, t)-mincuts. It takes ({mathcal {O}}(1) ) time to report the capacity of a resulting (s, t)-mincut (A, B) and ({mathcal {O}}(|B|) ) time to report the cut. (4) Lower bounds: For the data structure problems addressed in (2) and (3) above, we also provide a matching conditional lower bound. We establish a close relationship among three seemingly unrelated problems – all-pairs directed reachability problem, the dual edge sensitivity problem for (s, t)-mincuts, and the problem of reporting the capacity of ({x, y}, {u, v})-mincut for any four vertices x, y, u, v in G. Assuming the Directed Reachability Hypothesis by Patrascu [SIAM J. Computing, pages 827–847, 2011] and Goldstein et al. [WADS, pages 421-436, 2017], this leads to (tilde{Omega }(n^2) ) lower bounds on the space for the latter two problems.
{"title":"Minimum+1 (s,t)-cuts and Dual Edge Sensitivity Oracle","authors":"Surender Baswana, Koustav Bhanja, Abhyuday Pandey","doi":"10.1145/3623271","DOIUrl":"https://doi.org/10.1145/3623271","url":null,"abstract":"Let G be a directed multi-graph on n vertices and m edges with a designated source vertex s and a designated sink vertex t. We study the (s, t)-cuts of capacity minimum+1 and as an important application of them, we give a solution to the dual edge sensitivity for (s, t)-mincuts – reporting an (s, t)-mincut upon failure or insertion of any pair of edges. Picard and Queyranne [Mathematical Programming Studies, 13(1):8-16, 1980] showed that there exists a directed acyclic graph (DAG) that compactly stores all minimum (s, t)-cuts of G. This structure also acts as an oracle for the single edge sensitivity of minimum (s, t)-cut. For undirected multi-graphs, Dinitz and Nutov [STOC, pages 509-518, 1995] showed that there exists an ({mathcal {O}}(n) ) size 2-level cactus model that stores all global cuts of capacity minimum+1. However, for minimum+1 (s, t)-cuts, no such compact structure exists till date. We present the following structural and algorithmic results on minimum+1 (s, t)-cuts. (1) Structure: There is an ({mathcal {O}}(m) ) size 2-level DAG structure that stores all minimum+1 (s, t)-cuts of G such that each minimum+1 (s, t)-cut appears as 3-transversal cut – it intersects any path in this structure at most thrice. We also show that there is an ({mathcal {O}}(mn) ) size structure for storing and characterizing all minimum+1 (s, t)-cuts in terms of 1-transversal cuts. (2) Data structure: There exists an ({mathcal {O}}(n^2) ) size data structure that, given a pair of vertices {u, v} which are not separated by an (s, t)-mincut, can determine in ({mathcal {O}}(1) ) time if there exists a minimum+1 (s, t)-cut, say (A, B), such that s, u ∈ A and v, t ∈ B; the corresponding cut can be reported in ({mathcal {O}}(|B|) ) time.(3) Sensitivity oracle: There exists an ({mathcal {O}}(n^2) ) size data structure that solves the dual edge sensitivity problem for (s, t)-mincuts. It takes ({mathcal {O}}(1) ) time to report the capacity of a resulting (s, t)-mincut (A, B) and ({mathcal {O}}(|B|) ) time to report the cut. (4) Lower bounds: For the data structure problems addressed in (2) and (3) above, we also provide a matching conditional lower bound. We establish a close relationship among three seemingly unrelated problems – all-pairs directed reachability problem, the dual edge sensitivity problem for (s, t)-mincuts, and the problem of reporting the capacity of ({x, y}, {u, v})-mincut for any four vertices x, y, u, v in G. Assuming the Directed Reachability Hypothesis by Patrascu [SIAM J. Computing, pages 827–847, 2011] and Goldstein et al. [WADS, pages 421-436, 2017], this leads to (tilde{Omega }(n^2) ) lower bounds on the space for the latter two problems.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42177549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-24DOI: https://dl.acm.org/doi/10.1145/3610227
Arnold Filtser, Omrit Filtser
Given a curve P with points in (mathbb {R}^d ) in a streaming fashion, and parameters ε > 0 and k, we construct a distance oracle that uses (O(frac{1}{varepsilon })^{kd}log varepsilon ^{-1} ) space, and given a query curve Q with k points in (mathbb {R}^d ), returns in (tilde{O}(kd) ) time a 1 + ε approximation of the discrete Fréchet distance between Q and P. In addition, we construct simplifications in the streaming model, oracle for distance queries to a sub-curve (in the static setting), and introduce the zoom-in problem. Our algorithms work in any dimension d, and therefore we generalize some useful tools and algorithms for curves under the discrete Fréchet distance to work efficiently in high dimensions.
{"title":"Static and Streaming Data Structures for Fréchet Distance Queries","authors":"Arnold Filtser, Omrit Filtser","doi":"https://dl.acm.org/doi/10.1145/3610227","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3610227","url":null,"abstract":"<p>Given a curve <i>P</i> with points in (mathbb {R}^d ) in a streaming fashion, and parameters ε > 0 and <i>k</i>, we construct a distance oracle that uses (O(frac{1}{varepsilon })^{kd}log varepsilon ^{-1} ) space, and given a query curve <i>Q</i> with <i>k</i> points in (mathbb {R}^d ), returns in (tilde{O}(kd) ) time a 1 + ε approximation of the discrete Fréchet distance between <i>Q</i> and <i>P</i>. In addition, we construct simplifications in the streaming model, oracle for distance queries to a sub-curve (in the static setting), and introduce the zoom-in problem. Our algorithms work in any dimension <i>d</i>, and therefore we generalize some useful tools and algorithms for curves under the discrete Fréchet distance to work efficiently in high dimensions.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 13","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-21DOI: https://dl.acm.org/doi/10.1145/3607141
Philip Bille, Inge Li Gørtz, Teresa Anna Steiner
Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.
{"title":"String Indexing with Compressed Patterns","authors":"Philip Bille, Inge Li Gørtz, Teresa Anna Steiner","doi":"https://dl.acm.org/doi/10.1145/3607141","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3607141","url":null,"abstract":"<p>Given a string <i>S</i> of length <i>n</i>, the classic string indexing problem is to preprocess <i>S</i> into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 14","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-14DOI: https://dl.acm.org/doi/10.1145/3597640
Kevin Buchin, Chenglin Fan, Maarten Löffler, Aleksandr Popov, Benjamin Raichel, Marcel Roeloffzen
In this article, we study a wide range of variants for computing the (discrete and continuous) Fréchet distance between uncertain curves. An uncertain curve is a sequence of uncertainty regions, where each region is a disk, a line segment, or a set of points. A realisation of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Fréchet distance, which are the minimum and maximum Fréchet distance for any realisations of the curves.
We prove that both problems are NP-hard for the Fréchet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Fréchet distance. In contrast, the lower bound (discrete [5] and continuous) Fréchet distance can be computed in polynomial time in some models. Furthermore, we show that computing the expected (discrete and continuous) Fréchet distance is #P-hard in some models.
On the positive side, we present an FPTAS in constant dimension for the lower bound problem when Δ/δ is polynomially bounded, where δ is the Fréchet distance and Δ bounds the diameter of the regions. We also show a near-linear-time 3-approximation for the decision problem on roughly δ-separated convex regions. Finally, we study the setting with Sakoe–Chiba time bands, where we restrict the alignment between the curves, and give polynomial-time algorithms for the upper bound and expected discrete and continuous Fréchet distance for uncertainty modelled as point sets.
{"title":"Fréchet Distance for Uncertain Curves","authors":"Kevin Buchin, Chenglin Fan, Maarten Löffler, Aleksandr Popov, Benjamin Raichel, Marcel Roeloffzen","doi":"https://dl.acm.org/doi/10.1145/3597640","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3597640","url":null,"abstract":"<p>In this article, we study a wide range of variants for computing the (discrete and continuous) Fréchet distance between uncertain curves. An uncertain curve is a sequence of <i>uncertainty regions,</i> where each region is a disk, a line segment, or a set of points. A <i>realisation</i> of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Fréchet distance, which are the minimum and maximum Fréchet distance for any realisations of the curves. </p><p>We prove that both problems are NP-hard for the Fréchet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Fréchet distance. In contrast, the lower bound (discrete [5] and continuous) Fréchet distance can be computed in polynomial time in some models. Furthermore, we show that computing the expected (discrete and continuous) Fréchet distance is #P-hard in some models.</p><p>On the positive side, we present an FPTAS in constant dimension for the lower bound problem when Δ/δ is polynomially bounded, where δ is the Fréchet distance and Δ bounds the diameter of the regions. We also show a near-linear-time 3-approximation for the decision problem on roughly δ-separated convex regions. Finally, we study the setting with Sakoe–Chiba time bands, where we restrict the alignment between the curves, and give polynomial-time algorithms for the upper bound and expected discrete and continuous Fréchet distance for uncertainty modelled as point sets.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 16","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-14DOI: https://dl.acm.org/doi/10.1145/3594873
Enoch Peserico, Michele Scquizzato
We present a simple proof that no randomized online matching algorithm for the line can be ((sqrt {log _2(n+1)}/15))-competitive against an oblivious adversary for any n = 2i - 1 : i ∈ ℕ. This is the first super-constant lower bound for the problem, and disproves as a corollary a recent conjecture on the topology-parametrized competitiveness achievable on generic spaces.
{"title":"Matching on the Line Admits no (o(sqrt {log n})) -Competitive Algorithm","authors":"Enoch Peserico, Michele Scquizzato","doi":"https://dl.acm.org/doi/10.1145/3594873","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3594873","url":null,"abstract":"<p>We present a simple proof that no randomized online matching algorithm for the line can be ((sqrt {log _2(n+1)}/15))-competitive against an oblivious adversary for any <i>n</i> = 2<sup><i></i>i</sup> - 1 : <i>i</i> ∈ ℕ. This is the first super-constant lower bound for the problem, and disproves as a corollary a recent conjecture on the topology-parametrized competitiveness achievable on generic spaces.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 15","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}