稀疏-稀疏矩阵乘法的稀疏感知分布式内存算法

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-08-26 DOI:arxiv-2408.14558

Yuxi Hong, Aydin Buluc

{"title":"稀疏-稀疏矩阵乘法的稀疏感知分布式内存算法","authors":"Yuxi Hong, Aydin Buluc","doi":"arxiv-2408.14558","DOIUrl":null,"url":null,"abstract":"Multiplying two sparse matrices (SpGEMM) is a common computational primitive\nused in many areas including graph algorithms, bioinformatics, algebraic\nmultigrid solvers, and randomized sketching. Distributed-memory parallel\nalgorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that\nuse 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically\nreduce communication by not fetching nonzeros of the sparse matrices that do\nnot participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.\nIt uses MPI RDMA operations to mitigate the cost of packing/unpacking\nsubmatrices for communication, and it uses a block fetching strategy to avoid\nexcessive fine-grained messaging. Our results show that our 1D implementation\noutperforms state-of-the-art 2D and 3D implementations within CombBLAS for many\nconfigurations, inputs, and use cases, while remaining conceptually simpler.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication\",\"authors\":\"Yuxi Hong, Aydin Buluc\",\"doi\":\"arxiv-2408.14558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiplying two sparse matrices (SpGEMM) is a common computational primitive\\nused in many areas including graph algorithms, bioinformatics, algebraic\\nmultigrid solvers, and randomized sketching. Distributed-memory parallel\\nalgorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that\\nuse 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically\\nreduce communication by not fetching nonzeros of the sparse matrices that do\\nnot participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.\\nIt uses MPI RDMA operations to mitigate the cost of packing/unpacking\\nsubmatrices for communication, and it uses a block fetching strategy to avoid\\nexcessive fine-grained messaging. Our results show that our 1D implementation\\noutperforms state-of-the-art 2D and 3D implementations within CombBLAS for many\\nconfigurations, inputs, and use cases, while remaining conceptually simpler.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"64 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

两个稀疏矩阵相乘（SpGEMM）是一种常见的计算基元，在图算法、生物信息学、代数多网格求解器和随机草图等许多领域都有应用。针对 SpGEMM 的分布式内存并行算法主要集中在使用二维和三维分割的稀疏性盲方法上。理论上，稀疏感知的一维算法可以通过不获取不参与乘法的稀疏矩阵的非零点来减少通信量。在这里，我们介绍了分布式内存 1D SpGEMM 算法及其实现。它使用 MPI RDMA 操作来降低打包/解包子矩阵的通信成本，并使用分块获取策略来避免过多的细粒度消息传递。我们的研究结果表明，在 CombBLAS 中，我们的一维实现在许多配置、输入和用例方面都优于最先进的二维和三维实现，同时在概念上也更加简单。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically reduce communication by not fetching nonzeros of the sparse matrices that do not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation. It uses MPI RDMA operations to mitigate the cost of packing/unpacking submatrices for communication, and it uses a block fetching strategy to avoid excessive fine-grained messaging. Our results show that our 1D implementation outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many configurations, inputs, and use cases, while remaining conceptually simpler.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Distributed, Parallel, and Cluster Computing

自引率

0.00%

发文量

期刊最新文献

Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844