基于运行时系统的结构化密集矩阵的O(N)分布直接分解

arXiv - CS - Mathematical Software Pub Date : 2023-11-02 DOI:arxiv-2311.00921

Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca

{"title":"基于运行时系统的结构化密集矩阵的O(N)分布直接分解","authors":"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca","doi":"arxiv-2311.00921","DOIUrl":null,"url":null,"abstract":"Structured dense matrices result from boundary integral problems in\nelectrostatics and geostatistics, and also Schur complements in sparse\npreconditioners such as multi-frontal methods. Exploiting the structure of such\nmatrices can reduce the time for dense direct factorization from $O(N^3)$ to\n$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\nmatrix format that can be factorized using a Cholesky-like algorithm called ULV\nfactorization. The HSS-ULV algorithm is highly parallel because it removes the\ndependency on trailing sub-matrices at each HSS level. However, a key merge\nstep that links two successive HSS levels remains a challenge for efficient\nparallelization. In this paper, we use an asynchronous runtime system PaRSEC\nwith the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\nstate-of-the-art implementations of dense direct low rank factorization, and\nachieve up to 2x better factorization time for matrices arising from a diverse\nset of applications on up to 128 nodes of Fugaku for similar or better accuracy\nfor all the problems that we survey.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"$O(N)$ distributed direct factorization of structured dense matrices using runtime systems\",\"authors\":\"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca\",\"doi\":\"arxiv-2311.00921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Structured dense matrices result from boundary integral problems in\\nelectrostatics and geostatistics, and also Schur complements in sparse\\npreconditioners such as multi-frontal methods. Exploiting the structure of such\\nmatrices can reduce the time for dense direct factorization from $O(N^3)$ to\\n$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\\nmatrix format that can be factorized using a Cholesky-like algorithm called ULV\\nfactorization. The HSS-ULV algorithm is highly parallel because it removes the\\ndependency on trailing sub-matrices at each HSS level. However, a key merge\\nstep that links two successive HSS levels remains a challenge for efficient\\nparallelization. In this paper, we use an asynchronous runtime system PaRSEC\\nwith the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\\nstate-of-the-art implementations of dense direct low rank factorization, and\\nachieve up to 2x better factorization time for matrices arising from a diverse\\nset of applications on up to 128 nodes of Fugaku for similar or better accuracy\\nfor all the problems that we survey.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"13 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2311.00921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2311.00921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

结构密集矩阵来源于静电学和地统计学中的边界积分问题，也来源于稀疏预处理中的Schur互补，如多正面方法。利用这种矩阵的结构可以将密集直接分解的时间从$O(N^3)$减少到$O(N)$。层次半可分(HSS)矩阵就是这样一种低秩矩阵格式，可以使用称为ULVfactorization的类cholesky算法进行分解。HSS- ulv算法是高度并行的，因为它消除了对每个HSS级别的尾子矩阵的依赖。然而，连接两个连续HSS级别的关键合并步骤仍然是有效并行化的挑战。在本文中，我们使用了一个异步运行时系统parsecs与HSS-ULV算法。我们将我们的工作与STRUMPACK和LORAPO进行了比较，两者都是最先进的密集直接低秩分解实现，并且在多达128个Fugaku节点上对各种应用产生的矩阵实现了高达2倍的分解时间，对于我们调查的所有问题具有相似或更好的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

$O(N)$ distributed direct factorization of structured dense matrices using runtime systems

Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of such matrices can reduce the time for dense direct factorization from $O(N^3)$ to $O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank matrix format that can be factorized using a Cholesky-like algorithm called ULV factorization. The HSS-ULV algorithm is highly parallel because it removes the dependency on trailing sub-matrices at each HSS level. However, a key merge step that links two successive HSS levels remains a challenge for efficient parallelization. In this paper, we use an asynchronous runtime system PaRSEC with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both state-of-the-art implementations of dense direct low rank factorization, and achieve up to 2x better factorization time for matrices arising from a diverse set of applications on up to 128 nodes of Fugaku for similar or better accuracy for all the problems that we survey.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Mathematical Software

自引率

0.00%

发文量