ACM Transactions on Algorithms最新文献_第4页

Genome assembly, from practice to theory: safe, complete and linear-time 基因组组装，从实践到理论:安全，完整和线性时间

3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-11-08 DOI: 10.1145/3632176

Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, Elia C. Zirondelli

Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. While such paths constitute only partial assemblies, they are likely to be correct. More precisely, if one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. Until recently, it was open what are all the safe walks of an assembly graph. Tomescu and Medvedev (RECOMB 2016) characterized all such safe walks (omnitigs), thus giving the first safe and complete genome assembly algorithm. Even though maximal omnitig finding was later improved to quadratic time by Cairo et al. (ACM Trans. Algorithms 2019), it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We answer this question affirmatively, by describing a surprising O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig. This has two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact O(m) representation of all maximal omnitigs, which allows, e.g., for O(m)-time computation of various statistics on them. Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.

基因组组装要求从许多较短的子串中重建一个未知的字符串。尽管它是生物信息学中的关键问题之一，但通常缺乏重大的理论进展。它的困难来自于实际问题(真实数据的大小和误差)，也来自于问题表述本质上承认多种解决方案的事实。考虑到这些，在其核心，大多数最先进的汇编程序都是基于在汇编图中找到非分支路径(单元)。虽然这样的路径只构成部分程序集，但它们可能是正确的。更准确地说，如果将基因组组装解定义为图的闭合弧覆盖行走，则单位出现在所有解中，因此是安全的部分解。直到最近，它都是开放的，什么是一个组装图的所有安全行走。Tomescu和Medvedev (RECOMB 2016)描述了所有这些安全行走(omnitigs)，从而给出了第一个安全完整的基因组组装算法。尽管后来Cairo等人将最大全向发现改进为二次时间。算法(2019)，寻找单位的关键线性时间特征是否可以通过全集获得仍然是开放的。我们肯定地回答了这个问题，通过描述一个惊人的O (m)时间算法来识别具有n个节点和m条弧的图的所有最大全图，尽管存在具有Θ (mn)总最大全图大小的图族。这是基于一个行走族(macrotigs)的发现，其性质是所有非平凡的全行走都是一个宏行走的子行走的唯一扩展。这有两个结果:(1)一个线性时间输出敏感的算法枚举所有最大的全集。(2)所有极大全群的紧凑的O (m)表示，它允许，例如，在O (m)时间内计算各种统计量。我们的研究结果解决了一个长期存在的理论问题，这个问题是由1995年开始使用单位的实际基因组组装者启发的。我们设想我们的结果是从理论到实际和完整基因组组装程序的反向转移的核心，就像其他关键生物信息学问题一样。

{"title":"Genome assembly, from practice to theory: safe, complete and linear-time","authors":"Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, Elia C. Zirondelli","doi":"10.1145/3632176","DOIUrl":"https://doi.org/10.1145/3632176","url":null,"abstract":"Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. While such paths constitute only partial assemblies, they are likely to be correct. More precisely, if one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. Until recently, it was open what are all the safe walks of an assembly graph. Tomescu and Medvedev (RECOMB 2016) characterized all such safe walks (omnitigs), thus giving the first safe and complete genome assembly algorithm. Even though maximal omnitig finding was later improved to quadratic time by Cairo et al. (ACM Trans. Algorithms 2019), it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We answer this question affirmatively, by describing a surprising O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig. This has two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact O(m) representation of all maximal omnitigs, which allows, e.g., for O(m)-time computation of various statistics on them. Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

An Improved Algorithm for The k -Dyck Edit Distance Problem k -Dyck编辑距离问题的一种改进算法

3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-10-19 DOI: 10.1145/3627539

Dvir Fried, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, Tatiana Starikovskaya

A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses S is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform S into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses S and a positive integer k , and the goal is to compute the Dyck edit distance of S only if the distance is at most k , and otherwise report that the distance is larger than k . Backurs and Onak [PODS’16] showed that the threshold Dyck edit distance problem can be solved in O ( n + k 16 ) time. In this work, we design new algorithms for the threshold Dyck edit distance problem which costs O ( n + k 4.544184 ) time with high probability or O ( n + k 4.853059 ) deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast (min , +) matrix product, and a careful modification of ideas used in Valiant’s parsing algorithm.

戴克序列是一个平衡的(各种类型的)开括号和闭括号序列。给定括号序列S的Dyck编辑距离是将S转换为Dyck序列所需的最小编辑操作(插入、删除和替换)数量。我们考虑阈值Dyck编辑距离问题，其中输入是括号S和正整数k的序列，目标是仅当距离不大于k时计算S的Dyck编辑距离，否则报告距离大于k。Backurs和Onak [PODS ' 16]表明阈值Dyck编辑距离问题可以在O (n + k 16)时间内解决。在这项工作中，我们为阈值Dyck编辑距离问题设计了新的算法，该问题的高概率时间为O (n + k 4.544184)，确定性时间为O (n + k 4.853059)。我们的算法结合了Dyck编辑距离问题的几个新的结构属性，一个快速(min， +)矩阵乘积的改进算法，以及对Valiant解析算法中使用的思想的仔细修改。

引用次数: 0

Cluster Editing parameterized above modification-disjoint P ₃ -packings 聚类编辑参数化上述修改-不相交p3 -填料

3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-10-11 DOI: 10.1145/3626526

Shaohua Li, Marcin Pilipczuk, Manuel Sorge

Given a graph G = ( V , E ) and an integer k , the Cluster Editing problem asks whether we can transform G into a union of vertex-disjoint cliques by at most k modifications (edge deletions or insertions). In this paper, we study the following variant of Cluster Editing . We are given a graph G = ( V , E ), a packing (mathcal {H} ) of modification-disjoint induced P 3 s (no pair of P 3 s in (mathcal {H} ) share an edge or non-edge) and an integer ℓ. The task is to decide whether G can be transformed into a union of vertex-disjoint cliques by at most (ell +|mathcal {H}| ) modifications (edge deletions or insertions). We show that this problem is NP-hard even when ℓ = 0 (in which case the problem asks to turn G into a disjoint union of cliques by performing exactly one edge deletion or insertion per element of (mathcal {H} ) ) and when each vertex is in at most 23 P 3 s of the packing. This answers negatively a question of van Bevern, Froese, and Komusiewicz (CSR 2016, ToCS 2018), repeated by C. Komusiewicz at Shonan meeting no. 144 in March 2019. We then initiate the study to find the largest integer c such that the problem remains tractable when restricting to packings such that each vertex is in at most c packed P 3 s. Here packed P 3 s are those belonging to the packing (mathcal {H} ) . Van Bevern et al. showed that the case c = 1 is fixed-parameter tractable with respect to ℓ and we show that the case c = 2 is solvable in | V | 2ℓ + O (1) time.

给定一个图G = (V, E)和一个整数k，聚类编辑问题问我们是否可以通过最多k次修改(边删除或插入)将G转换为顶点不相交的团的并。在本文中，我们研究了以下变体的聚类编辑。我们给出了一个图G = (V, E)，一个由修正不相交诱导的p3s(在(mathcal {H} )中没有p3s对共享边或非边)和一个整数r组成的填充(mathcal {H} )。任务是决定G是否可以通过最多(ell +|mathcal {H}| )修改(边删除或插入)转换为顶点不相交的团的并。我们证明了这个问题是np困难的，即使当r = 0时(在这种情况下，问题要求通过对(mathcal {H} )的每个元素执行一个边删除或插入来将G变成一个不相交的团并)，并且当每个顶点最多在23p3的包装中。这否定地回答了van Bevern, Froese和Komusiewicz (CSR 2016, ToCS 2018)的问题，C. Komusiewicz在湘南会议上重复了这个问题。2019年3月为144。然后，我们开始研究寻找最大的整数c，使问题在限制每个顶点最多在c个填充的p3s中时仍然易于处理。这里包装的p3是属于(mathcal {H} )包装的。Van Bevern et al.证明了c = 1的情况是关于r的定参数可处理的，我们证明了c = 2的情况是在| V | 2r + O(1)时间内可解的。

{"title":"Cluster Editing parameterized above modification-disjoint P 3 -packings","authors":"Shaohua Li, Marcin Pilipczuk, Manuel Sorge","doi":"10.1145/3626526","DOIUrl":"https://doi.org/10.1145/3626526","url":null,"abstract":"Given a graph G = ( V , E ) and an integer k , the Cluster Editing problem asks whether we can transform G into a union of vertex-disjoint cliques by at most k modifications (edge deletions or insertions). In this paper, we study the following variant of Cluster Editing . We are given a graph G = ( V , E ), a packing (mathcal {H} ) of modification-disjoint induced P 3 s (no pair of P 3 s in (mathcal {H} ) share an edge or non-edge) and an integer ℓ. The task is to decide whether G can be transformed into a union of vertex-disjoint cliques by at most (ell +|mathcal {H}| ) modifications (edge deletions or insertions). We show that this problem is NP-hard even when ℓ = 0 (in which case the problem asks to turn G into a disjoint union of cliques by performing exactly one edge deletion or insertion per element of (mathcal {H} ) ) and when each vertex is in at most 23 P 3 s of the packing. This answers negatively a question of van Bevern, Froese, and Komusiewicz (CSR 2016, ToCS 2018), repeated by C. Komusiewicz at Shonan meeting no. 144 in March 2019. We then initiate the study to find the largest integer c such that the problem remains tractable when restricting to packings such that each vertex is in at most c packed P 3 s. Here packed P 3 s are those belonging to the packing (mathcal {H} ) . Van Bevern et al. showed that the case c = 1 is fixed-parameter tractable with respect to ℓ and we show that the case c = 2 is solvable in | V | 2ℓ + O (1) time.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136211408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Scalable High-Quality Hypergraph Partitioning 可伸缩的高质量超图分区

3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-10-09 DOI: 10.1145/3626527

Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag

Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across k different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening – which are used in a multilevel scheme with log ( n ) levels. As additional components, we parallelize the n -level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x (log ( n )-level), and 25.9x ( n -level).

平衡超图分区是许多应用程序的np难题，例如，优化分布式数据放置问题中的通信。目标是将所有节点放置在k个大小有限的不同块上，以便超边跨越尽可能少的部分。这个问题在顺序和分布式设置中得到了很好的研究，但在共享内存中没有得到很好的研究。我们通过为最佳顺序求解器中使用的所有组件设计高效且可扩展的共享内存算法来缩小这一差距，而不会影响解决方案的质量。这项工作提出了可扩展的高质量超图分区框架Mt-KaHyPar。其最重要的组成部分是基于FM算法和最大流量的并行改进算法，以及用于粗化的并行聚类算法，这些算法用于log (n)级的多级方案。作为附加组件，我们并行化了n级分区方案，设计了我们算法的确定性版本，并对纯图形进行了优化。我们在800多个图和超图上评估了我们的求解器，并将其与文献中的25种不同算法进行了比较。我们最快的配置在解决方案质量和运行时间方面优于几乎所有现有的超图分区器。我们的最高质量配置实现了与最佳顺序分区器KaHyPar相同的解决方案质量，同时在10个线程时速度要快一个数量级。因此，我们的两种配置占据了超图划分的帕累托曲线的所有前沿。此外，我们的求解器表现出良好的加速，例如，64核(确定性)的几何平均速度为29.6倍，22.3倍(log (n)级)和25.9倍(n级)。

{"title":"Scalable High-Quality Hypergraph Partitioning","authors":"Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag","doi":"10.1145/3626527","DOIUrl":"https://doi.org/10.1145/3626527","url":null,"abstract":"Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across k different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening – which are used in a multilevel scheme with log ( n ) levels. As additional components, we parallelize the n -level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x (log ( n )-level), and 25.9x ( n -level).","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Near-Optimal Time-Energy Trade-Offs for Deterministic Leader Election 确定性领导人选举的近最优时间-能量权衡

3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-09-26 DOI: 10.1145/3614429

Yi-Jun Chang, Ran Duan, Shunhua Jiang

We consider the energy complexity of the leader election problem in the single-hop radio network model, where each device v has a unique identifier ID ( v ) ∈{ 1, 2, ⋖ , N } . Energy is a scarce resource for small battery-powered devices. For such devices, most of the energy is often spent on communication, not on computation. To approximate the actual energy cost, the energy complexity of an algorithm is defined as the maximum over all devices of the number of time slots where the device transmits or listens. Much progress has been made in understanding the energy complexity of leader election in radio networks, but very little is known about the tradeoff between time and energy. Chang et al. [STOC 2017] showed that the optimal deterministic energy complexity of leader election is Θ (log log N ) if each device can simultaneously transmit and listen but still leaving the problem of determining the optimal time complexity under any given energy constraint. Time–energy tradeoff: For any k ≥ log log N , we show that a leader among at most n devices can be elected deterministically in O ( k ċ n 1+ε ) + O ( k ċ N 1/k ) time and O ( k ) energy if each device can simultaneously transmit and listen, where ε > 0 is any small constant. This improves upon the previous O ( N )-time O (log log N )-energy algorithm by Chang et al. [STOC 2017]. We provide lower bounds to show that the time–energy tradeoff of our algorithm is near-optimal. Dense instances: For the dense instances where the number of devices is n = Θ ( N ), we design a deterministic leader election algorithm using only O (1) energy. This improves upon the O (log* N )-energy algorithm by Jurdziński, Kutyłowski, and Zatopiański [PODC 2002] and the O (α ( N ))-energy algorithm by Chang et al. [STOC 2017]. More specifically, we show that the optimal deterministic energy complexity of leader election is (Theta (max lbrace 1, log tfrac{N}{n}rbrace)) if each device cannot simultaneously transmit and listen, and it is (Θ (max lbrace 1, log log tfrac{N}{n}rbrace)) if each device can simultaneously transmit and listen.

我们考虑单跳无线网络模型中领导者选举问题的能量复杂度，其中每个设备v有一个唯一标识符ID (v)∈{1,2，⋖，N}。对于小型电池供电设备来说，能源是一种稀缺资源。对于这样的设备，大部分能量通常花在通信上，而不是计算上。为了接近实际的能量消耗，算法的能量复杂度定义为设备发送或收听的时隙数量在所有设备中的最大值。在了解无线网络领导人选举的能量复杂性方面已经取得了很大进展，但对时间和能量之间的权衡知之甚少。Chang等人[STOC 2017]表明，如果每个设备可以同时发送和侦听，但仍然留下在任何给定能量约束下确定最优时间复杂度的问题，则领导者选举的最优确定性能量复杂度为Θ (log log N)。时间-能量权衡:对于任意k≥log log N，我们证明了在最多N个设备中，如果每个设备可以同时传输和侦听，则可以在O (k * N 1+ε) + O (k * N 1/k)时间和O (k * N 1/k)能量中确定性地选出一个领导者，其中ε ＆gt;0是任意小的常数。这改进了Chang等人之前的O (N)- O (log log N)-能量算法[STOC 2017]。我们提供了下界，以表明我们的算法的时间-能量权衡接近最优。密集实例:对于设备数量为n = Θ (n)的密集实例，我们设计了一个仅使用O(1)能量的确定性领导者选举算法。该算法改进了Jurdziński、Kutyłowski和Zatopiański的O (log* N)-能量算法[PODC 2002]和Chang等人的O (α (N))-能量算法[STOC 2017]。更具体地说，我们表明，当每个设备不能同时传输和侦听时，领导者选举的最优确定性能量复杂度为(Theta (max lbrace 1, log tfrac{N}{n}rbrace))，当每个设备同时传输和侦听时，其最优确定性能量复杂度为(Θ (max lbrace 1, log log tfrac{N}{n}rbrace))。

{"title":"Near-Optimal Time-Energy Trade-Offs for Deterministic Leader Election","authors":"Yi-Jun Chang, Ran Duan, Shunhua Jiang","doi":"10.1145/3614429","DOIUrl":"https://doi.org/10.1145/3614429","url":null,"abstract":"We consider the energy complexity of the leader election problem in the single-hop radio network model, where each device v has a unique identifier ID ( v ) ∈{ 1, 2, ⋖ , N } . Energy is a scarce resource for small battery-powered devices. For such devices, most of the energy is often spent on communication, not on computation. To approximate the actual energy cost, the energy complexity of an algorithm is defined as the maximum over all devices of the number of time slots where the device transmits or listens. Much progress has been made in understanding the energy complexity of leader election in radio networks, but very little is known about the tradeoff between time and energy. Chang et al. [STOC 2017] showed that the optimal deterministic energy complexity of leader election is Θ (log log N ) if each device can simultaneously transmit and listen but still leaving the problem of determining the optimal time complexity under any given energy constraint. Time–energy tradeoff: For any k ≥ log log N , we show that a leader among at most n devices can be elected deterministically in O ( k ċ n 1+ε ) + O ( k ċ N 1/k ) time and O ( k ) energy if each device can simultaneously transmit and listen, where ε > 0 is any small constant. This improves upon the previous O ( N )-time O (log log N )-energy algorithm by Chang et al. [STOC 2017]. We provide lower bounds to show that the time–energy tradeoff of our algorithm is near-optimal. Dense instances: For the dense instances where the number of devices is n = Θ ( N ), we design a deterministic leader election algorithm using only O (1) energy. This improves upon the O (log* N )-energy algorithm by Jurdziński, Kutyłowski, and Zatopiański [PODC 2002] and the O (α ( N ))-energy algorithm by Chang et al. [STOC 2017]. More specifically, we show that the optimal deterministic energy complexity of leader election is (Theta (max lbrace 1, log tfrac{N}{n}rbrace)) if each device cannot simultaneously transmit and listen, and it is (&#x0398; (max lbrace 1, log log tfrac{N}{n}rbrace)) if each device can simultaneously transmit and listen.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134904251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Minimum+1 (s,t)-cuts and Dual Edge Sensitivity Oracle 最小+1 (s,t)-切割和双边缘灵敏度Oracle

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-09-07 DOI: 10.1145/3623271

Surender Baswana, Koustav Bhanja, Abhyuday Pandey

Let G be a directed multi-graph on n vertices and m edges with a designated source vertex s and a designated sink vertex t. We study the (s, t)-cuts of capacity minimum+1 and as an important application of them, we give a solution to the dual edge sensitivity for (s, t)-mincuts – reporting an (s, t)-mincut upon failure or insertion of any pair of edges. Picard and Queyranne [Mathematical Programming Studies, 13(1):8-16, 1980] showed that there exists a directed acyclic graph (DAG) that compactly stores all minimum (s, t)-cuts of G. This structure also acts as an oracle for the single edge sensitivity of minimum (s, t)-cut. For undirected multi-graphs, Dinitz and Nutov [STOC, pages 509-518, 1995] showed that there exists an ({mathcal {O}}(n) ) size 2-level cactus model that stores all global cuts of capacity minimum+1. However, for minimum+1 (s, t)-cuts, no such compact structure exists till date. We present the following structural and algorithmic results on minimum+1 (s, t)-cuts. (1) Structure: There is an ({mathcal {O}}(m) ) size 2-level DAG structure that stores all minimum+1 (s, t)-cuts of G such that each minimum+1 (s, t)-cut appears as 3-transversal cut – it intersects any path in this structure at most thrice. We also show that there is an ({mathcal {O}}(mn) ) size structure for storing and characterizing all minimum+1 (s, t)-cuts in terms of 1-transversal cuts. (2) Data structure: There exists an ({mathcal {O}}(n^2) ) size data structure that, given a pair of vertices {u, v} which are not separated by an (s, t)-mincut, can determine in ({mathcal {O}}(1) ) time if there exists a minimum+1 (s, t)-cut, say (A, B), such that s, u ∈ A and v, t ∈ B; the corresponding cut can be reported in ({mathcal {O}}(|B|) ) time.(3) Sensitivity oracle: There exists an ({mathcal {O}}(n^2) ) size data structure that solves the dual edge sensitivity problem for (s, t)-mincuts. It takes ({mathcal {O}}(1) ) time to report the capacity of a resulting (s, t)-mincut (A, B) and ({mathcal {O}}(|B|) ) time to report the cut. (4) Lower bounds: For the data structure problems addressed in (2) and (3) above, we also provide a matching conditional lower bound. We establish a close relationship among three seemingly unrelated problems – all-pairs directed reachability problem, the dual edge sensitivity problem for (s, t)-mincuts, and the problem of reporting the capacity of ({x, y}, {u, v})-mincut for any four vertices x, y, u, v in G. Assuming the Directed Reachability Hypothesis by Patrascu [SIAM J. Computing, pages 827–847, 2011] and Goldstein et al. [WADS, pages 421-436, 2017], this leads to (tilde{Omega }(n^2) ) lower bounds on the space for the latter two problems.

设G是一个有向多图，有n个顶点和m条边，指定源点s和指定汇聚点t。我们研究了容量最小+1的(s, t)-cut，作为它们的一个重要应用，我们给出了(s, t)-mincut -报告和(s, t)-mincut在任意边对失效或插入时的双边灵敏度的解。Picard和Queyranne[数学规划研究，13(1):8-16,1980]证明了存在一个有向无环图(DAG)，它紧凑地存储了g的所有最小(s, t)-切，这种结构也可以作为最小(s, t)-切的单边灵敏度的预言。对于无向多图，Dinitz和Nutov [STOC, pages 509-518, 1995]证明了存在一个 ({mathcal {O}}(n) ) 大小2级仙人掌模型，存储容量最小+1的所有全局切割。然而，对于最小+1 (s, t)-cuts，迄今为止还没有这样的紧凑结构存在。我们给出了以下关于最小+1 (s, t)-切的结构和算法结果。(1)结构:有一个 ({mathcal {O}}(m) ) size 2级DAG结构，存储G的所有最小+1 (s, t)切割，使得每个最小+1 (s, t)切割显示为3-横向切割-它与该结构中的任何路径相交最多三次。我们也证明了有一个 ({mathcal {O}}(mn) ) 大小结构，用于存储和表征所有最小+1 (s, t)-切割的1-横向切割。(2)数据结构:存在 ({mathcal {O}}(n^2) ) 大小数据结构，给定一对顶点 {U, v} 哪个不被一个(s, t)分隔，可以确定在 ({mathcal {O}}(1) ) 如果存在一个最小值+1 (s, t)-cut，例如(a, B)，使得s, u∈a, v, t∈B;相应的切割可以在报告中 ({mathcal {O}}(|B|) ) (3)灵敏度预测:存在一个 ({mathcal {O}}(n^2) ) 解决(s, t)-mincuts的双边灵敏度问题的大小数据结构。它需要 ({mathcal {O}}(1) ) 报告结果(s, t)的容量的时间-mincut (a, B)和 ({mathcal {O}}(|B|) ) 是时候报道裁员了。(4)下界:对于上面(2)和(3)中解决的数据结构问题，我们也提供了一个匹配的条件下界。我们建立了三个看似不相关的问题——全对有向可达性问题、(s, t)-mincuts的对偶边灵敏度问题和(s, t)的容量报告问题之间的密切关系。{X, y}， {U, v})-mincut对于任意4个顶点x, y, u, v在g中的任意4个点[SIAM J. Computing, page 827-847, 2011]和Goldstein等人[WADS, page 421-436, 2017]，这导致 (tilde{Omega }(n^2) ) 后两个问题的空间下界。

{"title":"Minimum+1 (s,t)-cuts and Dual Edge Sensitivity Oracle","authors":"Surender Baswana, Koustav Bhanja, Abhyuday Pandey","doi":"10.1145/3623271","DOIUrl":"https://doi.org/10.1145/3623271","url":null,"abstract":"Let G be a directed multi-graph on n vertices and m edges with a designated source vertex s and a designated sink vertex t. We study the (s, t)-cuts of capacity minimum+1 and as an important application of them, we give a solution to the dual edge sensitivity for (s, t)-mincuts – reporting an (s, t)-mincut upon failure or insertion of any pair of edges. Picard and Queyranne [Mathematical Programming Studies, 13(1):8-16, 1980] showed that there exists a directed acyclic graph (DAG) that compactly stores all minimum (s, t)-cuts of G. This structure also acts as an oracle for the single edge sensitivity of minimum (s, t)-cut. For undirected multi-graphs, Dinitz and Nutov [STOC, pages 509-518, 1995] showed that there exists an ({mathcal {O}}(n) ) size 2-level cactus model that stores all global cuts of capacity minimum+1. However, for minimum+1 (s, t)-cuts, no such compact structure exists till date. We present the following structural and algorithmic results on minimum+1 (s, t)-cuts. (1) Structure: There is an ({mathcal {O}}(m) ) size 2-level DAG structure that stores all minimum+1 (s, t)-cuts of G such that each minimum+1 (s, t)-cut appears as 3-transversal cut – it intersects any path in this structure at most thrice. We also show that there is an ({mathcal {O}}(mn) ) size structure for storing and characterizing all minimum+1 (s, t)-cuts in terms of 1-transversal cuts. (2) Data structure: There exists an ({mathcal {O}}(n^2) ) size data structure that, given a pair of vertices {u, v} which are not separated by an (s, t)-mincut, can determine in ({mathcal {O}}(1) ) time if there exists a minimum+1 (s, t)-cut, say (A, B), such that s, u ∈ A and v, t ∈ B; the corresponding cut can be reported in ({mathcal {O}}(|B|) ) time.(3) Sensitivity oracle: There exists an ({mathcal {O}}(n^2) ) size data structure that solves the dual edge sensitivity problem for (s, t)-mincuts. It takes ({mathcal {O}}(1) ) time to report the capacity of a resulting (s, t)-mincut (A, B) and ({mathcal {O}}(|B|) ) time to report the cut. (4) Lower bounds: For the data structure problems addressed in (2) and (3) above, we also provide a matching conditional lower bound. We establish a close relationship among three seemingly unrelated problems – all-pairs directed reachability problem, the dual edge sensitivity problem for (s, t)-mincuts, and the problem of reporting the capacity of ({x, y}, {u, v})-mincut for any four vertices x, y, u, v in G. Assuming the Directed Reachability Hypothesis by Patrascu [SIAM J. Computing, pages 827–847, 2011] and Goldstein et al. [WADS, pages 421-436, 2017], this leads to (tilde{Omega }(n^2) ) lower bounds on the space for the latter two problems.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42177549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Static and Streaming Data Structures for Fréchet Distance Queries 用于远程查询的静态和流数据结构

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-07-24 DOI: https://dl.acm.org/doi/10.1145/3610227

Arnold Filtser, Omrit Filtser

Given a curve P with points in (mathbb {R}^d ) in a streaming fashion, and parameters ε > 0 and k, we construct a distance oracle that uses (O(frac{1}{varepsilon })^{kd}log varepsilon ^{-1} ) space, and given a query curve Q with k points in (mathbb {R}^d ), returns in (tilde{O}(kd) ) time a 1 + ε approximation of the discrete Fréchet distance between Q and P. In addition, we construct simplifications in the streaming model, oracle for distance queries to a sub-curve (in the static setting), and introduce the zoom-in problem. Our algorithms work in any dimension d, and therefore we generalize some useful tools and algorithms for curves under the discrete Fréchet distance to work efficiently in high dimensions.

给定点为(mathbb {R}^d )的曲线P以流形式存在，参数ε ＆gt;0和k，我们构造了一个使用(O(frac{1}{varepsilon })^{kd}log varepsilon ^{-1} )空间的距离oracle，并给出了一条查询曲线Q，在(mathbb {R}^d )中有k个点，在(tilde{O}(kd) )时间内返回Q和p之间离散fr切距离的1 + ε近似值。此外，我们在流模型中构造了简化，oracle对子曲线进行距离查询(在静态设置下)，并引入了放大问题。我们的算法适用于任何维d，因此我们推广了一些有用的工具和算法，用于离散fr切距离下的曲线，以有效地在高维上工作。

引用次数: 0

String Indexing with Compressed Patterns 使用压缩模式的字符串索引

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-07-21 DOI: https://dl.acm.org/doi/10.1145/3607141

Philip Bille, Inge Li Gørtz, Teresa Anna Steiner

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.

给定一个长度为n的字符串S，经典的字符串索引问题是将S预处理成一个紧凑的数据结构，以支持高效的后续模式查询。在本文中，我们考虑了以压缩形式给出模式的基本变体，其目标是在模式的压缩大小方面实现更快的查询时间。这捕获了常见的客户机-服务器场景，其中客户机提交查询并以压缩形式将其传递给服务器。我们考虑如何有效地直接处理压缩后的查询，而不是服务器在处理查询之前对其进行解压缩。我们的主要成果是一种新的线性空间数据结构，对于使用经典的Lempel-Ziv 1977 (LZ77)压缩方案压缩的模式，它实现了近乎最优的查询时间。在此过程中，我们开发了几种独立感兴趣的数据结构技术，包括一种新颖的数据结构，它可以在线性空间中紧凑地编码字符串的所有LZ77压缩后缀，以及一种通用的尝试分解，它可以将搜索时间从尝试大小的对数减少到模式长度的对数。

{"title":"String Indexing with Compressed Patterns","authors":"Philip Bille, Inge Li Gørtz, Teresa Anna Steiner","doi":"https://dl.acm.org/doi/10.1145/3607141","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3607141","url":null,"abstract":"Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 14","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fréchet Distance for Uncertain Curves 不确定曲线的距离

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-07-14 DOI: https://dl.acm.org/doi/10.1145/3597640

Kevin Buchin, Chenglin Fan, Maarten Löffler, Aleksandr Popov, Benjamin Raichel, Marcel Roeloffzen

In this article, we study a wide range of variants for computing the (discrete and continuous) Fréchet distance between uncertain curves. An uncertain curve is a sequence of uncertainty regions, where each region is a disk, a line segment, or a set of points. A realisation of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Fréchet distance, which are the minimum and maximum Fréchet distance for any realisations of the curves.

We prove that both problems are NP-hard for the Fréchet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Fréchet distance. In contrast, the lower bound (discrete [5] and continuous) Fréchet distance can be computed in polynomial time in some models. Furthermore, we show that computing the expected (discrete and continuous) Fréchet distance is #P-hard in some models.

On the positive side, we present an FPTAS in constant dimension for the lower bound problem when Δ/δ is polynomially bounded, where δ is the Fréchet distance and Δ bounds the diameter of the regions. We also show a near-linear-time 3-approximation for the decision problem on roughly δ-separated convex regions. Finally, we study the setting with Sakoe–Chiba time bands, where we restrict the alignment between the curves, and give polynomial-time algorithms for the upper bound and expected discrete and continuous Fréchet distance for uncertainty modelled as point sets.

在本文中，我们研究了计算不确定曲线之间(离散和连续)fr切距离的各种变量。不确定曲线是一系列不确定区域，其中每个区域是一个圆盘、一条线段或一组点。曲线的实现是一条从每个区域连接一个点的折线。给定一条不确定曲线和第二条(确定或不确定)曲线，我们寻求计算下限和上界fr切距离，这是曲线的任何实现的最小和最大fr切距离。我们证明了这两个问题在几个不确定性模型中对于fr切距离都是np困难的，并且对于离散的fr切距离，上界问题仍然是困难的。相比之下，在某些模型中，下界(离散[5]和连续)fr切距离可以在多项式时间内计算。此外，我们表明，在某些模型中，计算期望(离散和连续)fr切距离是#P-hard。积极的一面是，当Δ/ Δ是多项式有界时，我们提出了一个恒定维的下界问题的FPTAS，其中Δ是fr切距离，Δ是区域直径的边界。我们还展示了一个近似线性时间3逼近的决策问题上的大致δ-分离凸区域。最后，我们研究了Sakoe-Chiba时间带的设置，在Sakoe-Chiba时间带中，我们限制了曲线之间的对齐，并给出了以点集建模的不确定性上界和期望离散和连续fr切距离的多项式时间算法。

{"title":"Fréchet Distance for Uncertain Curves","authors":"Kevin Buchin, Chenglin Fan, Maarten Löffler, Aleksandr Popov, Benjamin Raichel, Marcel Roeloffzen","doi":"https://dl.acm.org/doi/10.1145/3597640","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3597640","url":null,"abstract":"In this article, we study a wide range of variants for computing the (discrete and continuous) Fréchet distance between uncertain curves. An uncertain curve is a sequence of uncertainty regions, where each region is a disk, a line segment, or a set of points. A realisation of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Fréchet distance, which are the minimum and maximum Fréchet distance for any realisations of the curves. We prove that both problems are NP-hard for the Fréchet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Fréchet distance. In contrast, the lower bound (discrete [5] and continuous) Fréchet distance can be computed in polynomial time in some models. Furthermore, we show that computing the expected (discrete and continuous) Fréchet distance is #P-hard in some models.On the positive side, we present an FPTAS in constant dimension for the lower bound problem when Δ/δ is polynomially bounded, where δ is the Fréchet distance and Δ bounds the diameter of the regions. We also show a near-linear-time 3-approximation for the decision problem on roughly δ-separated convex regions. Finally, we study the setting with Sakoe–Chiba time bands, where we restrict the alignment between the curves, and give polynomial-time algorithms for the upper bound and expected discrete and continuous Fréchet distance for uncertainty modelled as point sets.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 16","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Matching on the Line Admits no (o(sqrt {log n})) -Competitive Algorithm 在线匹配不允许(o(sqrt {log n})) -竞争算法

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Algorithms

Pub Date : 2023-07-14 DOI: https://dl.acm.org/doi/10.1145/3594873

Enoch Peserico, Michele Scquizzato

We present a simple proof that no randomized online matching algorithm for the line can be ((sqrt {log _2(n+1)}/15))-competitive against an oblivious adversary for any n = 2ⁱ - 1 : i ∈ ℕ. This is the first super-constant lower bound for the problem, and disproves as a corollary a recent conjecture on the topology-parametrized competitiveness achievable on generic spaces.

我们给出了一个简单的证明，对于任意n = 2i - 1: i∈n，没有任何随机在线匹配算法可以与遗忘对手((sqrt {log _2(n+1)}/15))竞争。这是该问题的第一个超常数下界，并作为一个推论否定了最近关于在一般空间上可实现的拓扑参数化竞争的猜想。

引用次数: 0