A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-08-26 DOI:arxiv-2408.14558

Yuxi Hong, Aydin Buluc

引用次数: 0

Abstract

Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically reduce communication by not fetching nonzeros of the sparse matrices that do not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation. It uses MPI RDMA operations to mitigate the cost of packing/unpacking submatrices for communication, and it uses a block fetching strategy to avoid excessive fine-grained messaging. Our results show that our 1D implementation outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many configurations, inputs, and use cases, while remaining conceptually simpler.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

稀疏-稀疏矩阵乘法的稀疏感知分布式内存算法

两个稀疏矩阵相乘（SpGEMM）是一种常见的计算基元，在图算法、生物信息学、代数多网格求解器和随机草图等许多领域都有应用。针对 SpGEMM 的分布式内存并行算法主要集中在使用二维和三维分割的稀疏性盲方法上。理论上，稀疏感知的一维算法可以通过不获取不参与乘法的稀疏矩阵的非零点来减少通信量。在这里，我们介绍了分布式内存 1D SpGEMM 算法及其实现。它使用 MPI RDMA 操作来降低打包/解包子矩阵的通信成本，并使用分块获取策略来避免过多的细粒度消息传递。我们的研究结果表明，在 CombBLAS 中，我们的一维实现在许多配置、输入和用例方面都优于最先进的二维和三维实现，同时在概念上也更加简单。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

IF 5.4 3区医学PharmaceuticsPub Date : 2024-04-27 DOI: 10.3390/pharmaceutics16050594

Bryan T. Mayer, Lily Zhang, Allan C. deCamp, Chenchen Yu, Alicia Sato, Heather Angier, Kelly E. Seaton, Nicole Yates, Julie E. Ledgerwood, Kenneth Mayer, Marina Caskey, Michel Nussenzweig, Kathryn Stephenson, Boris Julg, Dan H. Barouch, Magdalena E. Sobieszczyk, Srilatha Edupuganti, Colleen F. Kelley, M. Juliana McElrath, Huub C. Gelderblom, Michael Pensiero, Adrian McDermott, Lucio Gama, Richard A. Koup, Peter B. Gilbert, Myron S. Cohen, Lawrence Corey, Ollivier Hyrien, Georgia D. Tomaras, Yunda Huang

来源期刊

arXiv - CS - Distributed, Parallel, and Cluster Computing

自引率

0.00%

发文量

期刊最新文献

Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844