{"title":"A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication","authors":"Yuxi Hong, Aydin Buluc","doi":"arxiv-2408.14558","DOIUrl":null,"url":null,"abstract":"Multiplying two sparse matrices (SpGEMM) is a common computational primitive\nused in many areas including graph algorithms, bioinformatics, algebraic\nmultigrid solvers, and randomized sketching. Distributed-memory parallel\nalgorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that\nuse 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically\nreduce communication by not fetching nonzeros of the sparse matrices that do\nnot participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.\nIt uses MPI RDMA operations to mitigate the cost of packing/unpacking\nsubmatrices for communication, and it uses a block fetching strategy to avoid\nexcessive fine-grained messaging. Our results show that our 1D implementation\noutperforms state-of-the-art 2D and 3D implementations within CombBLAS for many\nconfigurations, inputs, and use cases, while remaining conceptually simpler.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multiplying two sparse matrices (SpGEMM) is a common computational primitive
used in many areas including graph algorithms, bioinformatics, algebraic
multigrid solvers, and randomized sketching. Distributed-memory parallel
algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that
use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically
reduce communication by not fetching nonzeros of the sparse matrices that do
not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.
It uses MPI RDMA operations to mitigate the cost of packing/unpacking
submatrices for communication, and it uses a block fetching strategy to avoid
excessive fine-grained messaging. Our results show that our 1D implementation
outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many
configurations, inputs, and use cases, while remaining conceptually simpler.