{"title":"稀疏-稀疏矩阵乘法的稀疏感知分布式内存算法","authors":"Yuxi Hong, Aydin Buluc","doi":"arxiv-2408.14558","DOIUrl":null,"url":null,"abstract":"Multiplying two sparse matrices (SpGEMM) is a common computational primitive\nused in many areas including graph algorithms, bioinformatics, algebraic\nmultigrid solvers, and randomized sketching. Distributed-memory parallel\nalgorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that\nuse 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically\nreduce communication by not fetching nonzeros of the sparse matrices that do\nnot participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.\nIt uses MPI RDMA operations to mitigate the cost of packing/unpacking\nsubmatrices for communication, and it uses a block fetching strategy to avoid\nexcessive fine-grained messaging. Our results show that our 1D implementation\noutperforms state-of-the-art 2D and 3D implementations within CombBLAS for many\nconfigurations, inputs, and use cases, while remaining conceptually simpler.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication\",\"authors\":\"Yuxi Hong, Aydin Buluc\",\"doi\":\"arxiv-2408.14558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiplying two sparse matrices (SpGEMM) is a common computational primitive\\nused in many areas including graph algorithms, bioinformatics, algebraic\\nmultigrid solvers, and randomized sketching. Distributed-memory parallel\\nalgorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that\\nuse 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically\\nreduce communication by not fetching nonzeros of the sparse matrices that do\\nnot participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.\\nIt uses MPI RDMA operations to mitigate the cost of packing/unpacking\\nsubmatrices for communication, and it uses a block fetching strategy to avoid\\nexcessive fine-grained messaging. Our results show that our 1D implementation\\noutperforms state-of-the-art 2D and 3D implementations within CombBLAS for many\\nconfigurations, inputs, and use cases, while remaining conceptually simpler.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"64 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication
Multiplying two sparse matrices (SpGEMM) is a common computational primitive
used in many areas including graph algorithms, bioinformatics, algebraic
multigrid solvers, and randomized sketching. Distributed-memory parallel
algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that
use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically
reduce communication by not fetching nonzeros of the sparse matrices that do
not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation.
It uses MPI RDMA operations to mitigate the cost of packing/unpacking
submatrices for communication, and it uses a block fetching strategy to avoid
excessive fine-grained messaging. Our results show that our 1D implementation
outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many
configurations, inputs, and use cases, while remaining conceptually simpler.