GPU 上的高效 GMRES+AMG：复合平滑器和混合[数学]循环

IF 4.3 3区材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-09-03 DOI:10.1137/23m1578632

Stephen Thomas, Allison H. Baker

{"title":"GPU 上的高效 GMRES+AMG：复合平滑器和混合[数学]循环","authors":"Stephen Thomas, Allison H. Baker","doi":"10.1137/23m1578632","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Scientific Computing, Ahead of Print. <br/> Abstract. In this study, we introduce algorithms optimized for GPU architectures, aimed at efficiently solving large sparse linear systems, a central challenge in Navier–Stokes pressure projection problems. Our approach includes an adaptation of the GMRES algorithm, drawing inspiration from the merged vector operations first proposed by Bielich et al. [Parallel Comput., 112 (2022), 102940]. This adaptation increases computational intensity on GPU platforms through optimized vector update strategies. The algorithm incorporates modified and classical Gram–Schmidt methods with an algebraic multigrid (AMG) preconditioner, each tailored for GPU performance. A key innovation in our work is the development of a Gram–Schmidt projector [math] employing a rank-1 perturbation of the identity matrix. Designed to maximize the high memory bandwidth utilization of the AMD MI-250X GPU, this approach includes a strategy for treating the unit diagonal that minimizes memory reads, leading to a 25% increase in computational efficiency. The application of perturbation theory further ensures that orthogonality loss is limited to [math], where [math] is the number of iterations. Additionally, we introduce a mixed AMG [math]-cycle strategy combining ILU(0) and [math]-Jacobi smoothers, which achieves a 30–50% reduction in GPU compute times compared to conventional methods, while maintaining low backward error. This strategy, alongside our novel treatment of the diagonal in triangular matrices, marks a substantial increase in AMG efficicency for GPU systems. We believe that these contributions represent a significant advance in optimizing GMRES+AMG algorithms for GPU computations. The empirical results demonstrate notable speed increments and maintain rigorous backward error bounds, underscoring the potential of our methods to substantially increase computational efficiency in large-scale scientific applications.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient GMRES+AMG on GPUs: Composite Smoothers And Mixed [math]-Cycles\",\"authors\":\"Stephen Thomas, Allison H. Baker\",\"doi\":\"10.1137/23m1578632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SIAM Journal on Scientific Computing, Ahead of Print. <br/> Abstract. In this study, we introduce algorithms optimized for GPU architectures, aimed at efficiently solving large sparse linear systems, a central challenge in Navier–Stokes pressure projection problems. Our approach includes an adaptation of the GMRES algorithm, drawing inspiration from the merged vector operations first proposed by Bielich et al. [Parallel Comput., 112 (2022), 102940]. This adaptation increases computational intensity on GPU platforms through optimized vector update strategies. The algorithm incorporates modified and classical Gram–Schmidt methods with an algebraic multigrid (AMG) preconditioner, each tailored for GPU performance. A key innovation in our work is the development of a Gram–Schmidt projector [math] employing a rank-1 perturbation of the identity matrix. Designed to maximize the high memory bandwidth utilization of the AMD MI-250X GPU, this approach includes a strategy for treating the unit diagonal that minimizes memory reads, leading to a 25% increase in computational efficiency. The application of perturbation theory further ensures that orthogonality loss is limited to [math], where [math] is the number of iterations. Additionally, we introduce a mixed AMG [math]-cycle strategy combining ILU(0) and [math]-Jacobi smoothers, which achieves a 30–50% reduction in GPU compute times compared to conventional methods, while maintaining low backward error. This strategy, alongside our novel treatment of the diagonal in triangular matrices, marks a substantial increase in AMG efficicency for GPU systems. We believe that these contributions represent a significant advance in optimizing GMRES+AMG algorithms for GPU computations. The empirical results demonstrate notable speed increments and maintain rigorous backward error bounds, underscoring the potential of our methods to substantially increase computational efficiency in large-scale scientific applications.\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1137/23m1578632\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1137/23m1578632","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

SIAM 科学计算期刊》，提前印刷。摘要在本研究中，我们介绍了针对 GPU 架构进行优化的算法，旨在高效求解大型稀疏线性系统，这是 Navier-Stokes 压力投影问题的核心挑战。我们的方法包括对 GMRES 算法的改编，从 Bielich 等人首次提出的合并矢量运算中汲取灵感[《并行计算》，112 (2022)，102940]。这种调整通过优化向量更新策略提高了 GPU 平台的计算强度。该算法结合了修正的经典格兰-施密特方法和代数多网格（AMG）预处理器，每种方法都是为 GPU 性能量身定制的。我们工作中的一项关键创新是开发了一种格拉姆-施密特投影器[math]，采用了秩-1扰动特征矩阵。这种方法旨在最大限度地利用 AMD MI-250X GPU 的高内存带宽，其中包括一种处理单元对角线的策略，可最大限度地减少内存读取，从而将计算效率提高 25%。扰动理论的应用进一步确保了正交损失仅限于 [math]，其中 [math] 是迭代次数。此外，我们还引入了一种混合 AMG [math]循环策略，它结合了 ILU(0) 和 [math]-Jacobi 平滑器，与传统方法相比，GPU 计算时间减少了 30-50%，同时保持了较低的后向误差。这一策略以及我们对三角形矩阵对角线的新颖处理，标志着 GPU 系统 AMG 效率的大幅提升。我们相信，这些贡献代表了为 GPU 计算优化 GMRES+AMG 算法的重大进展。实证结果显示了显著的速度提升，并保持了严格的后向误差边界，突出了我们的方法在大规模科学应用中大幅提高计算效率的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Efficient GMRES+AMG on GPUs: Composite Smoothers And Mixed [math]-Cycles

SIAM Journal on Scientific Computing, Ahead of Print.
Abstract. In this study, we introduce algorithms optimized for GPU architectures, aimed at efficiently solving large sparse linear systems, a central challenge in Navier–Stokes pressure projection problems. Our approach includes an adaptation of the GMRES algorithm, drawing inspiration from the merged vector operations first proposed by Bielich et al. [Parallel Comput., 112 (2022), 102940]. This adaptation increases computational intensity on GPU platforms through optimized vector update strategies. The algorithm incorporates modified and classical Gram–Schmidt methods with an algebraic multigrid (AMG) preconditioner, each tailored for GPU performance. A key innovation in our work is the development of a Gram–Schmidt projector [math] employing a rank-1 perturbation of the identity matrix. Designed to maximize the high memory bandwidth utilization of the AMD MI-250X GPU, this approach includes a strategy for treating the unit diagonal that minimizes memory reads, leading to a 25% increase in computational efficiency. The application of perturbation theory further ensures that orthogonality loss is limited to [math], where [math] is the number of iterations. Additionally, we introduce a mixed AMG [math]-cycle strategy combining ILU(0) and [math]-Jacobi smoothers, which achieves a 30–50% reduction in GPU compute times compared to conventional methods, while maintaining low backward error. This strategy, alongside our novel treatment of the diagonal in triangular matrices, marks a substantial increase in AMG efficicency for GPU systems. We believe that these contributions represent a significant advance in optimizing GMRES+AMG algorithms for GPU computations. The empirical results demonstrate notable speed increments and maintain rigorous backward error bounds, underscoring the potential of our methods to substantially increase computational efficiency in large-scale scientific applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Applied Electronic Materials Multiple-

CiteScore

7.20

自引率

4.30%

发文量

567