A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication

Yuxi Hong, Aydin Buluc
{"title":"A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication","authors":"Yuxi Hong, Aydin Buluc","doi":"arxiv-2408.14558","DOIUrl":null,"url":null,"abstract":"Multiplying two sparse matrices (SpGEMM) is a common computational primitive\nused in many areas including graph algorithms, bioinformatics, algebraic\nmultigrid solvers, and randomized sketching. Distributed-memory parallel\nalgorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that\nuse 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically\nreduce communication by not fetching nonzeros of the sparse matrices that do\nnot participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.\nIt uses MPI RDMA operations to mitigate the cost of packing/unpacking\nsubmatrices for communication, and it uses a block fetching strategy to avoid\nexcessive fine-grained messaging. Our results show that our 1D implementation\noutperforms state-of-the-art 2D and 3D implementations within CombBLAS for many\nconfigurations, inputs, and use cases, while remaining conceptually simpler.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically reduce communication by not fetching nonzeros of the sparse matrices that do not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation. It uses MPI RDMA operations to mitigate the cost of packing/unpacking submatrices for communication, and it uses a block fetching strategy to avoid excessive fine-grained messaging. Our results show that our 1D implementation outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many configurations, inputs, and use cases, while remaining conceptually simpler.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
稀疏-稀疏矩阵乘法的稀疏感知分布式内存算法
两个稀疏矩阵相乘(SpGEMM)是一种常见的计算基元,在图算法、生物信息学、代数多网格求解器和随机草图等许多领域都有应用。针对 SpGEMM 的分布式内存并行算法主要集中在使用二维和三维分割的稀疏性盲方法上。理论上,稀疏感知的一维算法可以通过不获取不参与乘法的稀疏矩阵的非零点来减少通信量。在这里,我们介绍了分布式内存 1D SpGEMM 算法及其实现。它使用 MPI RDMA 操作来降低打包/解包子矩阵的通信成本,并使用分块获取策略来避免过多的细粒度消息传递。我们的研究结果表明,在 CombBLAS 中,我们的一维实现在许多配置、输入和用例方面都优于最先进的二维和三维实现,同时在概念上也更加简单。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Impact of LS Mutation on Pharmacokinetics of Preventive HIV Broadly Neutralizing Monoclonal Antibodies: A Cross-Protocol Analysis of 16 Clinical Trials in People without HIV
IF 5.4 3区 医学PharmaceuticsPub Date : 2024-04-27 DOI: 10.3390/pharmaceutics16050594
Bryan T. Mayer, Lily Zhang, Allan C. deCamp, Chenchen Yu, Alicia Sato, Heather Angier, Kelly E. Seaton, Nicole Yates, Julie E. Ledgerwood, Kenneth Mayer, Marina Caskey, Michel Nussenzweig, Kathryn Stephenson, Boris Julg, Dan H. Barouch, Magdalena E. Sobieszczyk, Srilatha Edupuganti, Colleen F. Kelley, M. Juliana McElrath, Huub C. Gelderblom, Michael Pensiero, Adrian McDermott, Lucio Gama, Richard A. Koup, Peter B. Gilbert, Myron S. Cohen, Lawrence Corey, Ollivier Hyrien, Georgia D. Tomaras, Yunda Huang
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1