首页 > 最新文献

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures最新文献

英文 中文
Achieving Sublinear Complexity under Constant T in T-interval Dynamic Networks 在T区间动态网络中实现常数T下的亚线性复杂度
Ruomu Hou, Irvan Jahja, Yucheng Sun, Jiyan Wu, Haifeng Yu
This paper considers standard T-interval dynamic networks, where the N nodes in the network proceed in lock-step rounds, and where the topology of the network can change arbitrarily from round to round, as determined by an adversary. The adversary promises that in every T consecutive rounds, the T (potentially different) topologies in those T rounds contain a common connected subgraph that spans all nodes. Within such a context, we propose novel algorithms for solving some fundamental distributed computing problems such as Count/Consensus/Max. Our algorithms are the first algorithms whose complexities do not contain an Ømega(N) term, under constant T values. Previous sublinear algorithms require significantly larger T values.
本文考虑标准的t区间动态网络,其中网络中的N个节点以锁步轮次进行,并且网络的拓扑结构可以任意地从一轮到一轮地变化,这是由对手决定的。对手承诺,在每T个连续的回合中,这T个回合中的T个(可能不同的)拓扑包含一个跨越所有节点的公共连接子图。在这样的背景下,我们提出了新的算法来解决一些基本的分布式计算问题,如Count/Consensus/Max。我们的算法是在恒定T值下复杂度不包含Ømega(N)项的第一批算法。以前的次线性算法需要更大的T值。
{"title":"Achieving Sublinear Complexity under Constant T in T-interval Dynamic Networks","authors":"Ruomu Hou, Irvan Jahja, Yucheng Sun, Jiyan Wu, Haifeng Yu","doi":"10.1145/3490148.3538571","DOIUrl":"https://doi.org/10.1145/3490148.3538571","url":null,"abstract":"This paper considers standard T-interval dynamic networks, where the N nodes in the network proceed in lock-step rounds, and where the topology of the network can change arbitrarily from round to round, as determined by an adversary. The adversary promises that in every T consecutive rounds, the T (potentially different) topologies in those T rounds contain a common connected subgraph that spans all nodes. Within such a context, we propose novel algorithms for solving some fundamental distributed computing problems such as Count/Consensus/Max. Our algorithms are the first algorithms whose complexities do not contain an Ømega(N) term, under constant T values. Previous sublinear algorithms require significantly larger T values.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"89 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134128198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contention Resolution for Coded Radio Networks 编码无线网络的争用解决
M. A. Bender, Seth Gilbert, F. Kuhn, John Kuszmaul, M. Médard
Randomized backoff protocols, such as exponential backoff, are a powerful tool for managing access to a shared resource, often a wireless communication channel (e.g., [1]). For a wireless device to transmit successfully, it uses a backoff protocol to ensure exclusive access to the channel. Modern radios, however, do not need exclusive access to the channel to communicate; in particular, they have the ability to receive useful information even when more than one device transmits at the same time. These capabilities have now been exploited for many years by systems that rely on interference cancellation, physical layer network coding and analog network coding to improve efficiency. For example, Zigzag decoding [56] demonstrated how a base station can decode messages sent by multiple devices simultaneously. In this paper, we address the following question: Can we design a backoff protocol that is better than exponential backoff when exclusive channel access is not required. We define the Coded Radio Network Model, which generalizes traditional radio network models (e.g., [30]). We then introduce the Decodable Backoff Algorithm, a randomized backoff protocol that achieves an optimal throughput of 1 - o (1). (Throughput 1 is optimal, as simultaneous reception does not increase the channel capacity.) The algorithm breaks the constant throughput lower bound for traditional radio networks [47-49], showing the power of these new hardware capabilities.
随机回退协议,如指数回退,是管理对共享资源的访问的强大工具,通常是无线通信通道(例如,[1])。为了使无线设备成功传输,它使用回退协议来确保对信道的独占访问。然而,现代无线电不需要独占信道来通信;特别是,即使有多个设备同时传输,它们也有能力接收有用的信息。这些功能已经被许多系统利用多年,这些系统依赖于干扰消除、物理层网络编码和模拟网络编码来提高效率。例如,Zigzag解码[56]演示了基站如何对多个设备同时发送的消息进行解码。在本文中,我们解决了以下问题:当不需要独占通道访问时,我们能否设计一个比指数回退更好的回退协议?我们定义了编码无线网络模型,它概括了传统的无线网络模型(例如,[30])。然后,我们介绍了可解码回退算法,这是一种随机回退协议,可实现1 - 0(1)的最佳吞吐量。(吞吐量1是最佳的,因为同时接收不会增加信道容量。)该算法打破了传统无线网络的恒定吞吐量下限[47-49],显示了这些新硬件功能的强大。
{"title":"Contention Resolution for Coded Radio Networks","authors":"M. A. Bender, Seth Gilbert, F. Kuhn, John Kuszmaul, M. Médard","doi":"10.1145/3490148.3538573","DOIUrl":"https://doi.org/10.1145/3490148.3538573","url":null,"abstract":"Randomized backoff protocols, such as exponential backoff, are a powerful tool for managing access to a shared resource, often a wireless communication channel (e.g., [1]). For a wireless device to transmit successfully, it uses a backoff protocol to ensure exclusive access to the channel. Modern radios, however, do not need exclusive access to the channel to communicate; in particular, they have the ability to receive useful information even when more than one device transmits at the same time. These capabilities have now been exploited for many years by systems that rely on interference cancellation, physical layer network coding and analog network coding to improve efficiency. For example, Zigzag decoding [56] demonstrated how a base station can decode messages sent by multiple devices simultaneously. In this paper, we address the following question: Can we design a backoff protocol that is better than exponential backoff when exclusive channel access is not required. We define the Coded Radio Network Model, which generalizes traditional radio network models (e.g., [30]). We then introduce the Decodable Backoff Algorithm, a randomized backoff protocol that achieves an optimal throughput of 1 - o (1). (Throughput 1 is optimal, as simultaneous reception does not increase the channel capacity.) The algorithm breaks the constant throughput lower bound for traditional radio networks [47-49], showing the power of these new hardware capabilities.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133664904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Shortest Paths with Negative Edge Weights 具有负边权的平行最短路径
Nairen Cao, Jeremy T. Fineman, Katina Russell
This paper presents a parallel version of Goldberg's algorithm for the problem of single-source shortest paths with integer (including negatives) edge weights. Given an input graph with n vertices, m edges, and integer weights ≥-N, our algorithms solves the problem with Õ(m √n log N) work and n5/4+o(1) log N span, both with high probability. Our algorithm thus has work similar to Goldberg's algorithm while also achieving at least m1/4-o(1) parallelism. To generate our parallel version of Goldberg's algorithm, we solve two specific distance-limited shortest-path problems, both with work Õ(m) and span √L · n1/2+o(1), where L is the distance limit.
本文针对边权为整数(包括负)的单源最短路径问题,提出了一种并行版本的Goldberg算法。给定一个n个顶点,m条边,整数权值≥-N的输入图,我们的算法以Õ(m√n log n)的工作量和n5/4+o(1) log n的空间解决问题,两者都具有高概率。因此,我们的算法在实现至少m1/4- 0(1)并行性的同时,其工作原理与Goldberg的算法相似。为了生成我们的并行版本的Goldberg算法,我们解决了两个特定的距离限制的最短路径问题,这两个问题都有功Õ(m)和跨度√L·n1/2+o(1),其中L是距离限制。
{"title":"Parallel Shortest Paths with Negative Edge Weights","authors":"Nairen Cao, Jeremy T. Fineman, Katina Russell","doi":"10.1145/3490148.3538583","DOIUrl":"https://doi.org/10.1145/3490148.3538583","url":null,"abstract":"This paper presents a parallel version of Goldberg's algorithm for the problem of single-source shortest paths with integer (including negatives) edge weights. Given an input graph with n vertices, m edges, and integer weights ≥-N, our algorithms solves the problem with Õ(m √n log N) work and n5/4+o(1) log N span, both with high probability. Our algorithm thus has work similar to Goldberg's algorithm while also achieving at least m1/4-o(1) parallelism. To generate our parallel version of Goldberg's algorithm, we solve two specific distance-limited shortest-path problems, both with work Õ(m) and span √L · n1/2+o(1), where L is the distance limit.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114495428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Average Awake Complexity of MIS and Matching MIS的平均唤醒复杂度与匹配
M. Ghaffari, Julian Portmann
Chatterjee, Gmyr, and Pandurangan [PODC 2020] recently introduced the notion of awake complexity for distributed algorithms, which measures the number of rounds in which a node is awake. In the other rounds, the node is sleeping and performs no computation or communication. Measuring the number of awake rounds can be of significance in many settings of distributed computing, e.g., in sensor networks where energy consumption is of concern. In that paper, Chatterjee et al. provide an elegant randomized algorithm for the Maximal Independent Set (MIS) problem that achieves an O(1) node-averaged awake complexity. That is, the average awake time among the nodes is O(1) rounds. However, to achieve that, the algorithm sacrifices the more standard round complexity measure from the well-known O(łog n) bound of MIS, due to Luby [STOC'85], to O(łog^3.41 n) rounds. Our first contribution is to present a simple randomized distributed MIS algorithm that, with high probability, has O(1) node-averaged awake complexity and O(łog n) worst-case round complexity. Our second, and more technical contribution, is to show algorithms with the same O(1) node-averaged awake complexity and O(łog n) worst-case round complexity for 1+ε approximation of maximum matching and 2+ε approximation of minimum vertex cover, where ε denotes an arbitrary small positive constant.
Chatterjee、Gmyr和Pandurangan [PODC 2020]最近为分布式算法引入了唤醒复杂度的概念,该概念测量节点处于唤醒状态的轮数。在其他回合中,节点处于休眠状态,不执行任何计算或通信。测量唤醒轮数在分布式计算的许多设置中具有重要意义,例如,在关注能耗的传感器网络中。在那篇论文中,Chatterjee等人提供了一种优雅的随机算法来解决最大独立集(MIS)问题,该算法实现了O(1)个节点平均唤醒复杂度。也就是说,节点之间的平均唤醒时间为O(1)轮。然而,为了实现这一点,该算法牺牲了更标准的轮复杂度度量,从众所周知的MIS的O(łog n)界,由于Luby [STOC'85],到O(łog^3.41 n)轮。我们的第一个贡献是提出了一个简单的随机分布式MIS算法,该算法在高概率下具有O(1)节点平均唤醒复杂度和O(łog n)最坏情况轮复杂度。我们的第二个技术贡献是展示了对于最大匹配的1+ε近似和最小顶点覆盖的2+ε近似具有相同的O(1)节点平均唤醒复杂度和O(łog n)最坏情况轮复杂度的算法,其中ε表示任意小的正常数。
{"title":"Average Awake Complexity of MIS and Matching","authors":"M. Ghaffari, Julian Portmann","doi":"10.1145/3490148.3538566","DOIUrl":"https://doi.org/10.1145/3490148.3538566","url":null,"abstract":"Chatterjee, Gmyr, and Pandurangan [PODC 2020] recently introduced the notion of awake complexity for distributed algorithms, which measures the number of rounds in which a node is awake. In the other rounds, the node is sleeping and performs no computation or communication. Measuring the number of awake rounds can be of significance in many settings of distributed computing, e.g., in sensor networks where energy consumption is of concern. In that paper, Chatterjee et al. provide an elegant randomized algorithm for the Maximal Independent Set (MIS) problem that achieves an O(1) node-averaged awake complexity. That is, the average awake time among the nodes is O(1) rounds. However, to achieve that, the algorithm sacrifices the more standard round complexity measure from the well-known O(łog n) bound of MIS, due to Luby [STOC'85], to O(łog^3.41 n) rounds. Our first contribution is to present a simple randomized distributed MIS algorithm that, with high probability, has O(1) node-averaged awake complexity and O(łog n) worst-case round complexity. Our second, and more technical contribution, is to show algorithms with the same O(1) node-averaged awake complexity and O(łog n) worst-case round complexity for 1+ε approximation of maximum matching and 2+ε approximation of minimum vertex cover, where ε denotes an arbitrary small positive constant.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"15 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125761526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A NUMA-Aware Recoverable Mutex Lock numa感知的可恢复互斥锁
Ahmed I. Fahmy, W. Golab
The mutual exclusion (ME) problem has been of interest to the scientific community since it was first defined by Dijkstra. Various algorithms have been developed to solve the problem, like the MCS and CLH queue-based locks. The problem was generalized into the recoverable mutual exclusion (RME) problem by Golab and Ramaraju to accommodate the possibility of process crash failures. Since then, multiple RME algorithms have been presented in the literature that vary in design and performance. Furthermore, non-uniform memory access (NUMA) architecture has become mainstream in designing modern distributed systems, stimulating the development of NUMA-aware mutex locks. None of the existing NUMA-aware mutex locks are recoverable to the best of our knowledge. In addition, none of the transformation techniques in the literature, such as flat-combining and cohort-locking, is a black-box transformation. Precisely, each of the existing transformation techniques requires specific characteristics of, and possible modifications to, the underlying NUMA-oblivious lock. In this work, we propose the Recoverable Filter (RF) lock, a black-box transformation approach that exploits memory locality to transform a NUMA-oblivious recoverable mutex lock into a NUMA-aware one. Practical experiments are conducted using two existing RME algorithms, Golab and Hendler's (GH) and Jayanti, Jayanti, and Joshi's (JJJ). The two RME locks are transformed into NUMA-aware locks using the proposed RF and the existing cohort algorithms. Results show that, in multi-socket configurations, our transformation boosts the performance of the NUMA-oblivious RME locks by up to 45%. The RME locks transformed using the proposed RF lock are slower than their non-recoverable cohort variants by up to 9%. Outcomes demonstrate that the overhead of our algorithm is minimal when using a single socket. Moreover, a deeper empirical assessment shows that the gap in performance between GH and JJJ is due to the entry section of JJJ, not its exit section.
自Dijkstra首次定义互斥(ME)问题以来,科学界一直对其感兴趣。已经开发了各种算法来解决这个问题,比如基于MCS和CLH队列的锁。为了适应进程崩溃失败的可能性,Golab和Ramaraju将该问题推广为可恢复互斥(RME)问题。从那时起,文献中出现了多种设计和性能各异的RME算法。此外,非均匀内存访问(NUMA)架构已经成为现代分布式系统设计的主流,这刺激了支持NUMA的互斥锁的发展。据我们所知,没有一个现有的numa感知互斥锁是可恢复的。此外,文献中的转换技术,如平面组合和队列锁定,都不是黑盒转换。确切地说,每种现有的转换技术都需要底层numa无关锁的特定特征和可能的修改。在这项工作中,我们提出了可恢复滤波器(RF)锁,这是一种黑盒转换方法,利用内存局域性将numa无关的可恢复互斥锁转换为numa感知的互斥锁。采用Golab和Hendler的(GH)和Jayanti, Jayanti, and Joshi的(JJJ)两种现有的RME算法进行了实际实验。使用提出的RF和现有的队列算法将两个RME锁转换为numa感知锁。结果表明,在多套接字配置中,我们的转换将numa无关的RME锁的性能提高了45%。使用建议的RF锁转换的RME锁比它们的不可恢复队列变体慢9%。结果表明,当使用单个套接字时,我们的算法开销最小。此外,更深入的实证评估表明,GH与JJJ之间的绩效差距是由于JJJ的进入部分,而不是退出部分。
{"title":"A NUMA-Aware Recoverable Mutex Lock","authors":"Ahmed I. Fahmy, W. Golab","doi":"10.1145/3490148.3538594","DOIUrl":"https://doi.org/10.1145/3490148.3538594","url":null,"abstract":"The mutual exclusion (ME) problem has been of interest to the scientific community since it was first defined by Dijkstra. Various algorithms have been developed to solve the problem, like the MCS and CLH queue-based locks. The problem was generalized into the recoverable mutual exclusion (RME) problem by Golab and Ramaraju to accommodate the possibility of process crash failures. Since then, multiple RME algorithms have been presented in the literature that vary in design and performance. Furthermore, non-uniform memory access (NUMA) architecture has become mainstream in designing modern distributed systems, stimulating the development of NUMA-aware mutex locks. None of the existing NUMA-aware mutex locks are recoverable to the best of our knowledge. In addition, none of the transformation techniques in the literature, such as flat-combining and cohort-locking, is a black-box transformation. Precisely, each of the existing transformation techniques requires specific characteristics of, and possible modifications to, the underlying NUMA-oblivious lock. In this work, we propose the Recoverable Filter (RF) lock, a black-box transformation approach that exploits memory locality to transform a NUMA-oblivious recoverable mutex lock into a NUMA-aware one. Practical experiments are conducted using two existing RME algorithms, Golab and Hendler's (GH) and Jayanti, Jayanti, and Joshi's (JJJ). The two RME locks are transformed into NUMA-aware locks using the proposed RF and the existing cohort algorithms. Results show that, in multi-socket configurations, our transformation boosts the performance of the NUMA-oblivious RME locks by up to 45%. The RME locks transformed using the proposed RF lock are slower than their non-recoverable cohort variants by up to 9%. Outcomes demonstrate that the overhead of our algorithm is minimal when using a single socket. Moreover, a deeper empirical assessment shows that the gap in performance between GH and JJJ is due to the entry section of JJJ, not its exit section.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Cover Trees and their Applications 平行覆盖树及其应用
Yan Gu, Zachary Napier, Yihan Sun, Letong Wang
The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable distributions (constant expansion rate and bounded aspect ratio mathematically), single-point insertion, single-point deletion, and nearest neighbor search (NNS) only cost logarithmically to the size of the point set. Unfortunately, due to the complication and the use of depth-first traversal order in the cover tree algorithms, we were unaware of any parallel approaches for these cover tree algorithms. This paper shows highly parallel and work-efficient cover tree algorithms that can handle batch insertions (and thus construction) and batch deletions. Assuming constant expansion rate and bounded aspect ratio, inserting or deleting m points into a cover tree with n points takes O(m log n) expected work and polylogarithmic span with high probability. Our algorithms rely on some novel algorithmic insights. We model the insertion and deletion process as a graph and use a maximal independent set (MIS) to generate tree nodes without conflicts. We use three key ideas to guarantee work-efficiency: the prefix-doubling scheme, a careful design to limit the graph size on which we apply MIS, and a strategy to propagate information among different levels in the cover tree. We also use path-copying to make our parallel cover tree a persistent data structure, which is useful in several applications. Using our parallel cover trees, we show work-efficient (or near-work-efficient) and highly parallel solutions for a list of problems in computational geometry and machine learning, including Euclidean minimum spanning tree (EMST), single-linkage clustering, bichromatic closest pair (BCP), density-based clustering and its hierarchical version, and others. To the best of our knowledge, many of them are the first solutions to achieve work-efficiency and polylogarithmic span assuming constant expansion rate and bounded aspect ratio.
覆盖树是一种规范的数据结构,它有效地维护度量空间上的动态点集,并支持最近邻和k近邻搜索。对于大多数具有合理分布的真实世界数据集(数学上恒定的扩展率和有界的宽高比),单点插入、单点删除和最近邻搜索(NNS)的成本仅为点集大小的对数。不幸的是,由于覆盖树算法的复杂性和深度优先遍历顺序的使用,我们不知道这些覆盖树算法有任何并行方法。本文展示了高度并行和工作效率高的覆盖树算法,可以处理批量插入(从而构建)和批量删除。假设扩展速率恒定,宽高比有界,在有n个点的覆盖树中插入或删除m个点需要O(m log n)的期望功和高概率的多对数张成。我们的算法依赖于一些新颖的算法见解。我们将插入和删除过程建模为一个图,并使用最大独立集(MIS)来生成无冲突的树节点。我们使用三个关键思想来保证工作效率:前缀加倍方案,仔细设计以限制我们应用MIS的图的大小,以及在覆盖树的不同层次之间传播信息的策略。我们还使用路径复制使并行覆盖树成为持久的数据结构,这在几个应用程序中都很有用。使用我们的并行覆盖树,我们展示了计算几何和机器学习中一系列问题的高效(或接近高效)和高度并行的解决方案,包括欧几里得最小生成树(EMST)、单链接聚类、双色最接近对(BCP)、基于密度的聚类及其分层版本等。据我们所知,它们中的许多都是第一个实现工作效率和多对数跨度的解决方案,假设恒定的扩展率和有限的宽高比。
{"title":"Parallel Cover Trees and their Applications","authors":"Yan Gu, Zachary Napier, Yihan Sun, Letong Wang","doi":"10.1145/3490148.3538581","DOIUrl":"https://doi.org/10.1145/3490148.3538581","url":null,"abstract":"The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable distributions (constant expansion rate and bounded aspect ratio mathematically), single-point insertion, single-point deletion, and nearest neighbor search (NNS) only cost logarithmically to the size of the point set. Unfortunately, due to the complication and the use of depth-first traversal order in the cover tree algorithms, we were unaware of any parallel approaches for these cover tree algorithms. This paper shows highly parallel and work-efficient cover tree algorithms that can handle batch insertions (and thus construction) and batch deletions. Assuming constant expansion rate and bounded aspect ratio, inserting or deleting m points into a cover tree with n points takes O(m log n) expected work and polylogarithmic span with high probability. Our algorithms rely on some novel algorithmic insights. We model the insertion and deletion process as a graph and use a maximal independent set (MIS) to generate tree nodes without conflicts. We use three key ideas to guarantee work-efficiency: the prefix-doubling scheme, a careful design to limit the graph size on which we apply MIS, and a strategy to propagate information among different levels in the cover tree. We also use path-copying to make our parallel cover tree a persistent data structure, which is useful in several applications. Using our parallel cover trees, we show work-efficient (or near-work-efficient) and highly parallel solutions for a list of problems in computational geometry and machine learning, including Euclidean minimum spanning tree (EMST), single-linkage clustering, bichromatic closest pair (BCP), density-based clustering and its hierarchical version, and others. To the best of our knowledge, many of them are the first solutions to achieve work-efficiency and polylogarithmic span assuming constant expansion rate and bounded aspect ratio.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"377 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115173435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Preparing for Disaster: Leveraging Precomputation to Efficiently Repair Graph Structures Upon Failures 灾难准备:利用预计算在故障时有效修复图结构
Calvin C. Newport, N. Vaidya, A. Weaver
Distributed algorithms for constructing structures such as a maximal independent set (MIS) or maximal matching (MM) are well-studied in standard message-passing network models. In this paper, we consider a natural variant of this problem in which we begin with an instance of the graph structure and partition our algorithm execution that follows into two stages. During the first stage after the graph structure is calculated, some additional precomputation is done. In the second stage, an arbitrary collection of k nodes are crashed. The goal is to then repair the structure as efficiently as possible. We are interested in the circumstances under which the repair can be faster than the time required to build the structure from scratch, and focus, in particular, on trade-offs in which extra precomputation rounds during the first stage can be traded for faster repairs during the second.
用于构造诸如最大独立集(MIS)或最大匹配(MM)等结构的分布式算法在标准消息传递网络模型中得到了很好的研究。在本文中,我们考虑了这个问题的一个自然变体,其中我们从图结构的一个实例开始,并将我们的算法执行分为两个阶段。在图结构计算完成后的第一阶段,进行一些额外的预计算。在第二阶段,k个节点的任意集合崩溃。我们的目标是尽可能高效地修复结构。我们感兴趣的是,在这种情况下,修复可以比从头开始构建结构所需的时间更快,并特别关注在第一阶段额外的预计算轮可以在第二阶段以更快的修复进行交换的权衡。
{"title":"Preparing for Disaster: Leveraging Precomputation to Efficiently Repair Graph Structures Upon Failures","authors":"Calvin C. Newport, N. Vaidya, A. Weaver","doi":"10.1145/3490148.3538564","DOIUrl":"https://doi.org/10.1145/3490148.3538564","url":null,"abstract":"Distributed algorithms for constructing structures such as a maximal independent set (MIS) or maximal matching (MM) are well-studied in standard message-passing network models. In this paper, we consider a natural variant of this problem in which we begin with an instance of the graph structure and partition our algorithm execution that follows into two stages. During the first stage after the graph structure is calculated, some additional precomputation is done. In the second stage, an arbitrary collection of k nodes are crashed. The goal is to then repair the structure as efficiently as possible. We are interested in the circumstances under which the repair can be faster than the time required to build the structure from scratch, and focus, in particular, on trade-offs in which extra precomputation rounds during the first stage can be traded for faster repairs during the second.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122385675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Brief Announcement: Faster Stencil Computations using Gaussian Approximations 简要公告:使用高斯近似更快的模板计算
Zafar Ahmad, R. Chowdhury, Rathish Das, P. Ganapathi, Aaron Gregory, Yimin Zhu
Stencil computations are widely used to simulate the change of state of physical systems. The current best algorithm for performing aperiodic linear stencil computations on a d (≥ 1)-dimensional grid of size N for T timesteps does Θ(TN1-1/d+N Log N) work. We introduce novel techniques based on random walks and Gaussian approximations for an asymptotic improvement of this work bound for a class of linear stencils. We also improve the span (i.e., parallel running time on an unbounded number of processors) asymptotically from the current state of the art.
模板计算被广泛用于模拟物理系统的状态变化。目前在d(≥1)维、大小为N、时间步长为T的网格上执行非周期线性模板计算的最佳算法是Θ(TN1-1/d+N Log N)。我们引入了基于随机漫步和高斯近似的新技术,对一类线性模板的工作界进行了渐近改进。我们还从目前的技术状态渐近地改进了跨度(即在无限数量的处理器上并行运行时间)。
{"title":"Brief Announcement: Faster Stencil Computations using Gaussian Approximations","authors":"Zafar Ahmad, R. Chowdhury, Rathish Das, P. Ganapathi, Aaron Gregory, Yimin Zhu","doi":"10.1145/3490148.3538558","DOIUrl":"https://doi.org/10.1145/3490148.3538558","url":null,"abstract":"Stencil computations are widely used to simulate the change of state of physical systems. The current best algorithm for performing aperiodic linear stencil computations on a d (≥ 1)-dimensional grid of size N for T timesteps does Θ(TN1-1/d+N Log N) work. We introduce novel techniques based on random walks and Gaussian approximations for an asymptotic improvement of this work bound for a class of linear stencils. We also improve the span (i.e., parallel running time on an unbounded number of processors) asymptotically from the current state of the art.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Fully-Distributed Scalable Peer-to-Peer Protocol for Byzantine-Resilient Distributed Hash Tables 拜占庭弹性分布式哈希表的完全分布式可扩展对等协议
John E. Augustine, Soumyottam Chatterjee, Gopal Pandurangan
Performing computation in the presence of faulty and malicious nodes is a central problem in distributed computing. Over 35 years ago, Dwork, Peleg, Pippenger, and Upfal [STOC 1986, SICOMP 1988] studied the fundamental Byzantine agreement problem in sparse, bounded degree networks and presented the first protocol that achieved almost-everywhere agreement among good nodes. However, this protocol and several subsequent protocols including that of King, Saia, Sanwalani, and Vee [FOCS 2006] had the drawback that they were not fully-distributed - in those protocols, nodes are required to have initial knowledge of the entire network topology. This drawback makes such protocols not applicable to real-world communication networks such as peer-to-peer (P2P) networks, which are typically sparse and bounded degree and where nodes initially have only local knowledge of themselves and of their neighbors.
在存在故障和恶意节点的情况下进行计算是分布式计算中的一个核心问题。35年前,Dwork, Peleg, Pippenger和Upfal [STOC 1986, SICOMP 1988]研究了稀疏有界度网络中的基本拜占庭协议问题,并提出了第一个在好节点之间几乎处处达成一致的协议。然而,这个协议和后来的几个协议,包括King、Saia、Sanwalani和Vee [FOCS 2006]的协议,都有一个缺点,那就是它们不是完全分布式的——在这些协议中,节点需要对整个网络拓扑结构有初步的了解。这个缺点使得这种协议不适用于现实世界的通信网络,比如点对点(P2P)网络,P2P网络通常是稀疏的、有界度的,节点最初只有自己和邻居的局部知识。
{"title":"A Fully-Distributed Scalable Peer-to-Peer Protocol for Byzantine-Resilient Distributed Hash Tables","authors":"John E. Augustine, Soumyottam Chatterjee, Gopal Pandurangan","doi":"10.1145/3490148.3538588","DOIUrl":"https://doi.org/10.1145/3490148.3538588","url":null,"abstract":"Performing computation in the presence of faulty and malicious nodes is a central problem in distributed computing. Over 35 years ago, Dwork, Peleg, Pippenger, and Upfal [STOC 1986, SICOMP 1988] studied the fundamental Byzantine agreement problem in sparse, bounded degree networks and presented the first protocol that achieved almost-everywhere agreement among good nodes. However, this protocol and several subsequent protocols including that of King, Saia, Sanwalani, and Vee [FOCS 2006] had the drawback that they were not fully-distributed - in those protocols, nodes are required to have initial knowledge of the entire network topology. This drawback makes such protocols not applicable to real-world communication networks such as peer-to-peer (P2P) networks, which are typically sparse and bounded degree and where nodes initially have only local knowledge of themselves and of their neighbors.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129712260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance Analysis and Modelling of Concurrent Multi-access Data Structures 并发多访问数据结构的性能分析与建模
A. Rukundo, A. Atalar, P. Tsigas
The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system. We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures.
扩展并发数据结构的主要障碍是访问共享数据结构访问点时的内存争用,这会导致线程序列化,阻碍并行性。为了解决这一挑战,文献中大量的工作已经提出了提高并发数据结构并行性的多访问技术。然而,对并发多访问数据结构的执行行为进行分析和建模的工作很少,特别是在共享内存设置中。本文对共享内存环境下并发多访问数据结构的一般执行行为进行了分析和建模。我们研究和分析了两种流行的随机访问模式的行为:共享(远程)和独占(本地)访问,以及设计无锁数据结构的两种最常用的原子原语的行为:比较和交换,以及Fetch和Add。我们通过将线程执行过程划分为五个逻辑会话来建模并发多访问:I)旁工,ii)接入点搜索,iii)接入点采集,iv)接入点数据采集,v)接入点数据操作。我们将接入点的获取建模为具有并行服务器的封闭排队网络系统,并根据数据在内存系统中的位置进行数据获取。我们在一组并发数据结构设计上评估了我们的模型,包括计数器、堆栈和FIFO队列。该评估是在两种最先进的多核处理器上进行的:英特尔至强Phi CPU 7290具有72个物理核和英特尔至强E5-2695具有14个物理核。我们的模型能够在两种架构上以80%到100%的准确率预测给定并发数据结构的吞吐量性能。
{"title":"Performance Analysis and Modelling of Concurrent Multi-access Data Structures","authors":"A. Rukundo, A. Atalar, P. Tsigas","doi":"10.1145/3490148.3538578","DOIUrl":"https://doi.org/10.1145/3490148.3538578","url":null,"abstract":"The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system. We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132106609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1