Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca
{"title":"基于运行时系统的结构化密集矩阵的O(N)分布直接分解","authors":"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca","doi":"arxiv-2311.00921","DOIUrl":null,"url":null,"abstract":"Structured dense matrices result from boundary integral problems in\nelectrostatics and geostatistics, and also Schur complements in sparse\npreconditioners such as multi-frontal methods. Exploiting the structure of such\nmatrices can reduce the time for dense direct factorization from $O(N^3)$ to\n$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\nmatrix format that can be factorized using a Cholesky-like algorithm called ULV\nfactorization. The HSS-ULV algorithm is highly parallel because it removes the\ndependency on trailing sub-matrices at each HSS level. However, a key merge\nstep that links two successive HSS levels remains a challenge for efficient\nparallelization. In this paper, we use an asynchronous runtime system PaRSEC\nwith the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\nstate-of-the-art implementations of dense direct low rank factorization, and\nachieve up to 2x better factorization time for matrices arising from a diverse\nset of applications on up to 128 nodes of Fugaku for similar or better accuracy\nfor all the problems that we survey.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"$O(N)$ distributed direct factorization of structured dense matrices using runtime systems\",\"authors\":\"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca\",\"doi\":\"arxiv-2311.00921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Structured dense matrices result from boundary integral problems in\\nelectrostatics and geostatistics, and also Schur complements in sparse\\npreconditioners such as multi-frontal methods. Exploiting the structure of such\\nmatrices can reduce the time for dense direct factorization from $O(N^3)$ to\\n$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\\nmatrix format that can be factorized using a Cholesky-like algorithm called ULV\\nfactorization. The HSS-ULV algorithm is highly parallel because it removes the\\ndependency on trailing sub-matrices at each HSS level. However, a key merge\\nstep that links two successive HSS levels remains a challenge for efficient\\nparallelization. In this paper, we use an asynchronous runtime system PaRSEC\\nwith the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\\nstate-of-the-art implementations of dense direct low rank factorization, and\\nachieve up to 2x better factorization time for matrices arising from a diverse\\nset of applications on up to 128 nodes of Fugaku for similar or better accuracy\\nfor all the problems that we survey.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"13 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2311.00921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2311.00921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
$O(N)$ distributed direct factorization of structured dense matrices using runtime systems
Structured dense matrices result from boundary integral problems in
electrostatics and geostatistics, and also Schur complements in sparse
preconditioners such as multi-frontal methods. Exploiting the structure of such
matrices can reduce the time for dense direct factorization from $O(N^3)$ to
$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank
matrix format that can be factorized using a Cholesky-like algorithm called ULV
factorization. The HSS-ULV algorithm is highly parallel because it removes the
dependency on trailing sub-matrices at each HSS level. However, a key merge
step that links two successive HSS levels remains a challenge for efficient
parallelization. In this paper, we use an asynchronous runtime system PaRSEC
with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both
state-of-the-art implementations of dense direct low rank factorization, and
achieve up to 2x better factorization time for matrices arising from a diverse
set of applications on up to 128 nodes of Fugaku for similar or better accuracy
for all the problems that we survey.