gpu集群上高效的代数多网格预处理

Parallel Process. Lett. Pub Date : 2019-05-10 DOI:10.1142/S0129626419500014

A. A. Hassan, V. Cardellini, P. D'Ambra, D. Serafino, S. Filippone

{"title":"gpu集群上高效的代数多网格预处理","authors":"A. A. Hassan, V. Cardellini, P. D'Ambra, D. Serafino, S. Filippone","doi":"10.1142/S0129626419500014","DOIUrl":null,"url":null,"abstract":"Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"93 Suppl 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs\",\"authors\":\"A. A. Hassan, V. Cardellini, P. D'Ambra, D. Serafino, S. Filippone\",\"doi\":\"10.1142/S0129626419500014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.\",\"PeriodicalId\":422436,\"journal\":{\"name\":\"Parallel Process. Lett.\",\"volume\":\"93 Suppl 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Process. Lett.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129626419500014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626419500014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

许多科学应用需要使用Krylov子空间方法求解大型稀疏线性方程组;在这种情况下，有效预条件的选择可能对克雷洛夫解算器的收敛性至关重要。代数多重网格(algeaic MultiGrid, AMG)方法由于其最优的计算成本和算法可扩展性而被广泛应用于预处理。gpu的广泛使用，现在在许多最快的超级计算机中都可以找到，这就提出了在高吞吐量处理器上有效实现这些方法的问题。在这项工作中，我们专注于AMG预调节器的应用阶段，特别是能够利用gpu集群计算能力的平滑和粗糙级解算器的选择和实现。我们在求解阶段使用与局部块相关的稀疏近似逆来考虑块jacobi平滑。选择近似逆而不是稀疏矩阵分解是由于与gpu上的大型三角形系统的解相比，矩阵向量乘积暴露出大量的并行性。选择的平滑器和求解器在MLD2P4库提供的AMG预处理框架内实现，使用来自PSBLAS库的合适的稀疏矩阵数据结构。他们的行为在执行速度和可扩展性方面得到了说明，在一个关于地下水建模的测试用例中，由地平线2020项目EoCoE中的j lich超级计算中心提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs

Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Parallel Process. Lett.

自引率

0.00%

发文量

期刊最新文献

A Note to Non-adaptive Broadcasting Semi-Supervised Node Classification via Semi-Global Graph Transformer Based on Homogeneity Augmentation 4-Free Strong Digraphs with the Maximum Size Relation-aware Graph Contrastive Learning The Normalized Laplacian Spectrum of Folded Hypercube with Applications