Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures最新文献

英文中文

Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient 许多顺序迭代算法可以并行且(接近)高效

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-05-25 DOI: 10.1145/3490148.3538574

Zheqi Shen, Zijin Wan, Yan Gu, Yihan Sun

Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.

最近的一些论文表明，通过识别输入对象之间的依赖关系，许多顺序迭代算法可以直接并行化。这种方法产生了许多简单实用的并行算法，但在实现工作效率和高并行性方面仍然存在挑战。工作效率意味着操作次数与最佳顺序解渐近相同。当对象之间的依赖关系的数量渐近地多于最优顺序工作时，这对于某些问题来说可能很难，而且我们甚至无法承担生成它们的成本。为了实现高并行性，我们总是希望它并行处理尽可能多的对象。我们的目标是为具有最深依赖长度D的问题实现O (D)跨度。我们将这个特性称为循环效率。本文提出了各种经典问题的工作效率和循环效率算法，并提出了实现这些算法的一般方法。为了有效地并行许多顺序迭代算法，我们提出了相位并行框架。该框架为每个对象分配一个等级，并根据其等级顺序处理对象。所有具有相同秩的对象都可以并行处理。为了实现工作效率和高并行性，我们使用两种类型的通用技术。Type 1算法的目标是使用范围查询来提取具有相同秩的所有对象，以避免评估所有依赖关系。我们讨论了活动选择，以及使用类型1框架的Dijkstra算法。Type 2算法的目标是在对象所依赖的最后一个对象完成时唤醒对象。我们讨论了活动选择，最长递增子序列(LIS)，贪婪最大独立集(MIS)，以及许多其他使用类型2框架的算法。我们所有的算法(几乎)都是工作效率和循环效率，其中一些(例如，LIS)是第一个实现这两者的算法。它们中的许多改进了以前的最佳边界。此外，我们实施了许多实验研究。在具有合理依赖深度的输入上，我们的算法是高度并行化的，并且显著优于顺序算法。

{"title":"Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient","authors":"Zheqi Shen, Zijin Wan, Yan Gu, Yihan Sun","doi":"10.1145/3490148.3538574","DOIUrl":"https://doi.org/10.1145/3490148.3538574","url":null,"abstract":"Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121217103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Competitive Algorithms for Block-Aware Caching 块感知缓存的竞争算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-05-24 DOI: 10.1145/3490148.3538567

Christian Coester, Roie Levin, J. Naor, Ohad Talmon

Motivated by the design of real system storage hierarchies, we study the block-aware caching problem, a generalization of classic caching in which fetching (or evicting) pages from the same block incurs the same cost as fetching (or evicting) just one page from the block. Given a cache of size k, and a sequence of requests from n pages partitioned into given blocks of size β ≤ k, the goal is to minimize the total cost of fetching to (or evicting from) cache. This problem captures generalized caching as a special case, which is already NP-hard offline. We show the following suite of results: For the eviction cost model, we show an O(log k)-approximate offline algorithm, a k-competitive deterministic online algorithm, and an O(log2 k)-competitive randomized online algorithm. For the fetching cost model, we show an integrality gap of Ω(β) for the natural LP relaxation of the problem, and an Ω(β +log k) lower bound for randomized online algorithms. The strategy of ignoring the block-structure and running a classical paging algorithm trivially achieves an O(β) approximation and an O(β log k) competitive ratio respectively for the offline and online-randomized setting. For both fetching and eviction models, we show improved bounds for the (h, k)-bicriteria version of the problem. In particular, when k = 2h, we match the performance of classical caching algorithms up to constant factors. Our results establish a strong separation between the tractability of the fetching and eviction cost models, which is interesting since fetching/eviction costs are the same up to an additive term for the classic caching problem. Previous work of Beckmann et al. (SPAA 21) only studied online deterministic algorithms for the fetching cost model when k > h. Our insight is to relax the block-aware caching problem to a submodular covering linear program. The main technical challenge is to maintain a competitive fractional solution to this LP, and to round it with bounded loss, as the constraints of this LP are revealed online. We hope that this framework is useful going forward for other problems that can be captured as submodular cover.

在实际系统存储层次结构设计的激励下，我们研究了块感知缓存问题，这是经典缓存的一种推广，其中从同一块中获取(或退出)页面所产生的成本与从块中获取(或退出)一个页面所产生的成本相同。给定大小为k的缓存，以及来自n个页面的请求序列，这些请求被划分为大小为β≤k的给定块，目标是最小化从缓存中抓取(或从缓存中移除)的总成本。这个问题将通用缓存作为一种特殊情况，它已经是NP-hard脱机了。我们展示了以下结果:对于驱逐成本模型，我们展示了一个O(log k)近似的离线算法，一个k竞争的确定性在线算法和一个O(log2k)竞争的随机在线算法。对于获取成本模型，我们显示了问题的自然LP松弛的完整性缺口Ω(β)，以及随机在线算法的Ω(β +log k)下界。忽略块结构并运行经典分页算法的策略在离线和在线随机设置下分别获得O(β)近似和O(β log k)竞争比。对于抓取和移除模型，我们展示了问题的(h, k)双标准版本的改进边界。特别是，当k = 2h时，我们将经典缓存算法的性能匹配到常数因子。我们的结果在提取和回收成本模型的可追溯性之间建立了强有力的分离，这很有趣，因为提取/回收成本对于经典缓存问题来说是相同的。Beckmann等人(SPAA 21)之前的工作只研究了k > h时获取成本模型的在线确定性算法。我们的见解是将块感知缓存问题放宽为覆盖线性规划的子模块。主要的技术挑战是保持该LP的竞争分数解，并使其具有有界损失，因为该LP的约束是在线显示的。我们希望这个框架对于其他可以作为子模块覆盖捕获的问题是有用的。

{"title":"Competitive Algorithms for Block-Aware Caching","authors":"Christian Coester, Roie Levin, J. Naor, Ohad Talmon","doi":"10.1145/3490148.3538567","DOIUrl":"https://doi.org/10.1145/3490148.3538567","url":null,"abstract":"Motivated by the design of real system storage hierarchies, we study the block-aware caching problem, a generalization of classic caching in which fetching (or evicting) pages from the same block incurs the same cost as fetching (or evicting) just one page from the block. Given a cache of size k, and a sequence of requests from n pages partitioned into given blocks of size β ≤ k, the goal is to minimize the total cost of fetching to (or evicting from) cache. This problem captures generalized caching as a special case, which is already NP-hard offline. We show the following suite of results: For the eviction cost model, we show an O(log k)-approximate offline algorithm, a k-competitive deterministic online algorithm, and an O(log2 k)-competitive randomized online algorithm. For the fetching cost model, we show an integrality gap of Ω(β) for the natural LP relaxation of the problem, and an Ω(β +log k) lower bound for randomized online algorithms. The strategy of ignoring the block-structure and running a classical paging algorithm trivially achieves an O(β) approximation and an O(β log k) competitive ratio respectively for the offline and online-randomized setting. For both fetching and eviction models, we show improved bounds for the (h, k)-bicriteria version of the problem. In particular, when k = 2h, we match the performance of classical caching algorithms up to constant factors. Our results establish a strong separation between the tractability of the fetching and eviction cost models, which is interesting since fetching/eviction costs are the same up to an additive term for the classic caching problem. Previous work of Beckmann et al. (SPAA 21) only studied online deterministic algorithms for the fetching cost model when k > h. Our insight is to relax the block-aware caching problem to a submodular covering linear program. The main technical challenge is to maintain a competitive fractional solution to this LP, and to round it with bounded loss, as the constraints of this LP are revealed online. We hope that this framework is useful going forward for other problems that can be captured as submodular cover.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The k-Server with Preferences Problem 具有首选项的k-Server问题

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-05-23 DOI: 10.1145/3490148.3538595

Jannik Castenow, Björn Feldkord, Till Knollmann, Manuel Malatyali, F. Heide

The famous k-Server Problem covers plenty of resource allocation scenarios, and several variations have been studied extensively for decades. However, to the best of our knowledge, no research has considered the problem if the servers are not identical and requests can express which specific servers should serve them. Therefore, we present a new model generalizing the k-Server Problem by preferences of the requests and proceed to study it in a uniform metric space for deterministic online algorithms (the special case of paging).

著名的k-Server问题涵盖了大量的资源分配场景，几十年来人们对其中的几种变体进行了广泛的研究。然而，据我们所知，没有研究考虑到如果服务器不相同并且请求可以表示应该由哪个特定的服务器为它们服务的问题。因此，我们提出了一个通过请求偏好来推广k-Server问题的新模型，并在一致度量空间中对确定性在线算法(分页的特殊情况)进行了研究。

引用次数: 2

The Energy Complexity of Las Vegas Leader Election 拉斯维加斯领导人选举的能源复杂性

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-05-17 DOI: 10.1145/3490148.3538586

Yi-Jun Chang, Shunhua Jiang

We consider the time (number of communication rounds) and energy (number of non-idle communication rounds per device) complexities of randomized leader election in a multiple-access channel, where the number of devices n ≥ 2 is unknown. It is well-known that for polynomial-time randomized leader election algorithms with success probability 1 - 1/poly(n), the optimal energy complexity is Θ(log log* n) if receivers can detect collisions, and it is Θ(log* n) otherwise.

我们考虑了多址信道中随机领导者选举的时间(通信轮数)和能量(每个设备的非空闲通信轮数)复杂性，其中设备数n≥2是未知的。众所周知，对于成功概率为1 - 1/poly(n)的多项式时间随机领导者选举算法，如果接收器能够检测到碰撞，其最优能量复杂度为Θ(log log* n)，否则为Θ(log* n)。

引用次数: 2

Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering 并行批处理-动态最小生成森林与动态聚类图聚类效率

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-05-10 DOI: 10.1145/3490148.3538584

Tom Tseng, Laxman Dhulipala, Julian Shun

Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of k edge insertions or deletions, our batch-dynamic MSF algorithm runs in O(k log6 n) expected amortized work and O(log4 n) span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters. Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires Ω(n1-o(1)) work per update or query on a graph with n vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow poly(n)-approximation. For average linkage, the bound weakens to Ω(n1/2-o(1)) for incremental and decremental algorithms, and the bounds still hold when allowing no(1) -approximation.

层次聚类(HAC)是一种常用的数据聚类算法，但尽管它很重要，但目前还没有一种具有良好理论保证的动态聚类算法。本文研究了边加权图上的动态HAC问题。由于单链接HAC简化为计算最小生成森林(MSF)，我们的第一个结果是用于维护MSF的并行批处理动态算法。对于一批k个边插入或删除，我们的批动态MSF算法以高概率运行在O(k log6 n)期望平摊工作和O(log4 n)张成的空间中。它是第一个完全动态的MSF算法，处理每次更新和多对数跨度的多对数工作的边缘更新批次。使用我们的MSF算法，我们得到了一个并行的批量动态算法，可以回答关于单链接图HAC簇的查询。我们的第二个结果是，动态图HAC对于其他常见的链接函数来说要困难得多。例如，假设强指数时间假设，动态图HAC需要Ω(n1-o(1))次更新或查询具有n个顶点的图，以实现完全链接、加权平均链接和平均链接。对于完全连杆和加权平均连杆，即使我们允许多(n)逼近，该界仍然适用于增量或递减算法。对于平均联动，对于增量和递减算法，边界减弱为Ω(n1/2-o(1))，当允许no(1) -近似时，边界仍然成立。

{"title":"Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering","authors":"Tom Tseng, Laxman Dhulipala, Julian Shun","doi":"10.1145/3490148.3538584","DOIUrl":"https://doi.org/10.1145/3490148.3538584","url":null,"abstract":"Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of k edge insertions or deletions, our batch-dynamic MSF algorithm runs in O(k log6 n) expected amortized work and O(log4 n) span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters. Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires Ω(n1-o(1)) work per update or query on a graph with n vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow poly(n)-approximation. For average linkage, the bound weakens to Ω(n1/2-o(1)) for incremental and decremental algorithms, and the bounds still hold when allowing no(1) -approximation.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"774 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115755663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Deterministic Distributed Sparse and Ultra-Sparse Spanners and Connectivity Certificates 确定性分布式稀疏和超稀疏扳手和连通性证书

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-04-29 DOI: 10.1145/3490148.3538565

Marcel Bezdrighin, Michael Elkin, M. Ghaffari, C. Grunau, Bernhard Haeupler, S. Ilchi, Václav Rozhoň

This paper presents efficient distributed algorithms for a number of fundamental problems in the area of graph sparsification:We provide the first deterministic distributed algorithm that computes an ultra-sparse spanner in polylog(n) rounds in weighted graphs. Concretely, our algorithm outputs a spanning subgraph with only n + o (n) edges in which the pairwise distances are stretched by a factor of at most O(logn · 2O(log* n) ).

本文为图稀疏化领域的一些基本问题提供了高效的分布式算法:我们提供了第一个确定性分布式算法，该算法在加权图的多对数(n)轮中计算超稀疏扳手。具体地说，我们的算法输出一个只有n + o (n)条边的生成子图，其中成对距离被拉伸的系数最多为o (logn·2O(log* n))。

引用次数: 3

Balanced Allocations in Batches: Simplified and Generalized 分批均衡分配:简化和一般化

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-03-25 DOI: 10.1145/3490148.3538593

Dimitrios Los, Thomas Sauerwald

We consider the allocation of m balls (jobs) into n bins (servers). In the Two-Choice process, for each of m sequentially arriving balls, two randomly chosen bins are sampled and the ball is placed in the least loaded bin. It is well-known that the maximum load is m/n + log2 logn + O(1) with high probability. Berenbrink, Czumaj, Englert, Friedetzky and Nagel [7] introduced a parallel version of this process, where m balls arrive in consecutive batches of size b = n each. Balls within the same batch are allocated in parallel, using the load information of the bins at the beginning of the batch. They proved that the gap of this process is O(logn) with high probability. In this work, we present a new analysis of this setting, which is based on exponential potential functions. This allows us to both simplify and generalize the analysis of [7] in different ways: (1) Our analysis covers a broad class of processes. This includes not only Two-Choice, but also processes with fewer bin samples like the (1 + β)-process, processes which can only receive one bit of information from each bin sample and graphical allocation, where bins correspond to vertices in a graph. (2) Balls may be of different weights, as long as their weights are independent samples from a distribution satisfying a technical condition on its moment generating function. (3) For any batch sizes b ≥ n, we prove a gap of is O (b/n·logn). For any b ∈ [n, n3], we improve this to is O (b/n + logn) and show that it is tight for a family of processes. This implies the unexpected result that for e.g. the (1 + β)-process with constant β ∈ (0, 1], the gap is Θ(logn) for all b ∈ [n, n logn]. We also conduct experiments which support our theoretical results, and even hint at a superiority of less powerful processes like (1+ β) for large batch sizes. Full version of the paper at: https://arxiv.org/abs/2203.13902.

我们考虑将m个球(作业)分配到n个箱(服务器)中。在two - choice过程中，对于m个依次到达的球，随机选择两个箱子进行采样，并将球放置在装载最少的箱子中。众所周知，最大负载是m/n + log2 logn + O(1)，且概率很大。Berenbrink, Czumaj, Englert, Friedetzky和Nagel引入了这个过程的并行版本，其中m个球以每个大小为b = n的连续批次到达。在同一批次中的球被并行分配，使用在批次开始时的桶的负载信息。他们高概率地证明了这一过程的间隙为O(logn)。在这项工作中，我们提出了一种基于指数势函数的新分析方法。这使我们能够以不同的方式简化和概括[7]的分析:(1)我们的分析涵盖了广泛的过程类别。这不仅包括Two-Choice，还包括较少的bin样本的处理，如(1 + β)-过程，只能从每个bin样本和图形分配中接收一位信息的过程，其中bin对应于图中的顶点。(2)球可以有不同的权值，只要它们的权值是一个分布中满足技术条件的独立样本。(3)对于任意批大小b≥n，我们证明了缺口为O (b/n·logn)。对于任意b∈[n, n3]，我们将其改进为O (b/n + logn)，并证明它对于一系列过程是紧的。这意味着意想不到的结果，例如，对于常数β∈(0,1)的(1 + β)-过程，对于所有b∈[n, n logn]，间隙为Θ(logn)。我们还进行了实验来支持我们的理论结果，甚至暗示了(1+ β)等不太强大的工艺在大批量生产中的优越性。全文见:https://arxiv.org/abs/2203.13902。

{"title":"Balanced Allocations in Batches: Simplified and Generalized","authors":"Dimitrios Los, Thomas Sauerwald","doi":"10.1145/3490148.3538593","DOIUrl":"https://doi.org/10.1145/3490148.3538593","url":null,"abstract":"We consider the allocation of m balls (jobs) into n bins (servers). In the Two-Choice process, for each of m sequentially arriving balls, two randomly chosen bins are sampled and the ball is placed in the least loaded bin. It is well-known that the maximum load is m/n + log2 logn + O(1) with high probability. Berenbrink, Czumaj, Englert, Friedetzky and Nagel [7] introduced a parallel version of this process, where m balls arrive in consecutive batches of size b = n each. Balls within the same batch are allocated in parallel, using the load information of the bins at the beginning of the batch. They proved that the gap of this process is O(logn) with high probability. In this work, we present a new analysis of this setting, which is based on exponential potential functions. This allows us to both simplify and generalize the analysis of [7] in different ways: (1) Our analysis covers a broad class of processes. This includes not only Two-Choice, but also processes with fewer bin samples like the (1 + β)-process, processes which can only receive one bit of information from each bin sample and graphical allocation, where bins correspond to vertices in a graph. (2) Balls may be of different weights, as long as their weights are independent samples from a distribution satisfying a technical condition on its moment generating function. (3) For any batch sizes b ≥ n, we prove a gap of is O (b/n·logn). For any b ∈ [n, n3], we improve this to is O (b/n + logn) and show that it is tight for a family of processes. This implies the unexpected result that for e.g. the (1 + β)-process with constant β ∈ (0, 1], the gap is Θ(logn) for all b ∈ [n, n logn]. We also conduct experiments which support our theoretical results, and even hint at a superiority of less powerful processes like (1+ β) for large batch sizes. Full version of the paper at: https://arxiv.org/abs/2203.13902.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124825433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Sparse Matrix Multiplication in the Low-Bandwidth Model 低带宽模型中的稀疏矩阵乘法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-03-02 DOI: 10.1145/3490148.3538575

Chetan Gupta, J. Hirvonen, Janne H. Korhonen, Jan Studen'y, J. Suomela

We study matrix multiplication in the low-bandwidth model: There are n computers, and we need to compute the product of two n × n matrices. Initially computer i knows row i of each input matrix. In one communication round each computer can send and receive one O(logn)-bit message. Eventually computer i has to output row i of the product matrix. We seek to understand the complexity of this problem in the uniformly sparse case: each row and column of each input matrix has at most d non-zeros and in the product matrix we only need to know the values of at most d elements in each row or column. This is exactly the setting that we have, e.g., when we apply matrix multiplication for triangle detection in graphs of maximum degree d. We focus on the supported setting: the structure of the matrices is known in advance; only the numerical values of nonzero elements are unknown.

我们在低带宽模型中研究矩阵乘法:有n台计算机，我们需要计算两个n × n矩阵的乘积。最初，计算机i知道每个输入矩阵的第i行。在一个通信轮中，每台计算机可以发送和接收一个O(logn)位的消息。最终，计算机i必须输出乘积矩阵的第i行。我们试图在均匀稀疏情况下理解这个问题的复杂性:每个输入矩阵的每一行和每一列最多有d个非零，而在乘积矩阵中，我们只需要知道每一行或每一列最多d个元素的值。这正是我们所拥有的设置，例如，当我们在最大度为d的图中应用矩阵乘法进行三角形检测时。我们关注支持的设置:矩阵的结构是事先已知的;只有非零元素的数值是未知的。

引用次数: 0

I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels 对称线性代数核的I/ o最优算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-02-21 DOI: 10.1145/3490148.3538587

Olivier Beaumont, Lionel Eyraud-Dubois, Mathieu Vérité, J. Langou

In this paper, we consider two fundamental symmetric kernels in linear algebra: the Cholesky factorization and the symmetric rank-k update (SYRK), with the classical three nested loops algorithms for these kernels. In addition, we consider a machine model with a fast memory of size S and an unbounded slow memory. In this model, all computations must be performed on operands in fast memory, and the goal is to minimize the amount of communication between slow and fast memories. As the set of computations is fixed by the choice of the algorithm, only the ordering of the computations (the schedule) directly influences the volume of communications.

本文研究了线性代数中的两种基本对称核:Cholesky分解和对称rank-k更新(syk)，并给出了这两种核的经典三套循环算法。另外，我们考虑一个具有大小为S的快速内存和无界慢内存的机器模型。在这个模型中，所有的计算都必须在快速内存中的操作数上执行，目标是最小化慢速内存和快速内存之间的通信量。由于计算集是由算法的选择确定的，因此只有计算的顺序(调度)直接影响通信量。

引用次数: 5

Permutation Predictions for Non-Clairvoyant Scheduling 非千里眼调度的排列预测

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Pub Date : 2022-02-21 DOI: 10.1145/3490148.3538579

Alexander Lindermayr, Nicole Megow

In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements with the objective to minimize the total (weighted) completion time. We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in online algorithm design. While previous works used predictions on processing requirements, we propose a new prediction model, which provides a relative order of jobs which could be seen as predicting algorithmic actions rather than parts of the unknown input. We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees and that they are learnable in both, theory and practice. We generalize the algorithmic framework proposed in the seminal paper by Kumar et al. (NeurIPS'18) and present the first learning-augmented scheduling results for weighted jobs and unrelated machines. We demonstrate in empirical experiments the practicability and superior performance compared to the previously suggested single-machine algorithms.

在非洞察力调度中，任务是寻找一种以最小化总(加权)完成时间为目标的具有先验未知加工需求的作业的在线调度策略。我们在最近流行的学习增强设置中重新审视了这个研究得很好的问题，该设置将(不可信的)预测集成到在线算法设计中。虽然以前的工作使用对处理需求的预测，但我们提出了一个新的预测模型，它提供了一个相对的工作顺序，可以被视为预测算法动作，而不是未知输入的一部分。我们表明，这些预测具有理想的属性，承认自然误差测量以及具有强大性能保证的算法，并且它们在理论和实践中都是可学习的。我们推广了Kumar等人(NeurIPS'18)在开创性论文中提出的算法框架，并提出了加权作业和不相关机器的第一个学习增强调度结果。我们在实验中证明了与之前提出的单机算法相比，该算法的实用性和优越的性能。

{"title":"Permutation Predictions for Non-Clairvoyant Scheduling","authors":"Alexander Lindermayr, Nicole Megow","doi":"10.1145/3490148.3538579","DOIUrl":"https://doi.org/10.1145/3490148.3538579","url":null,"abstract":"In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements with the objective to minimize the total (weighted) completion time. We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in online algorithm design. While previous works used predictions on processing requirements, we propose a new prediction model, which provides a relative order of jobs which could be seen as predicting algorithmic actions rather than parts of the unknown input. We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees and that they are learnable in both, theory and practice. We generalize the algorithmic framework proposed in the seminal paper by Kumar et al. (NeurIPS'18) and present the first learning-augmented scheduling results for weighted jobs and unrelated machines. We demonstrate in empirical experiments the practicability and superior performance compared to the previously suggested single-machine algorithms.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122237721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀