GPU 上的高效 GMRES+AMG:复合平滑器和混合[数学]循环

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-09-03 DOI:10.1137/23m1578632
Stephen Thomas, Allison H. Baker
{"title":"GPU 上的高效 GMRES+AMG:复合平滑器和混合[数学]循环","authors":"Stephen Thomas, Allison H. Baker","doi":"10.1137/23m1578632","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Scientific Computing, Ahead of Print. <br/> Abstract. In this study, we introduce algorithms optimized for GPU architectures, aimed at efficiently solving large sparse linear systems, a central challenge in Navier–Stokes pressure projection problems. Our approach includes an adaptation of the GMRES algorithm, drawing inspiration from the merged vector operations first proposed by Bielich et al. [Parallel Comput., 112 (2022), 102940]. This adaptation increases computational intensity on GPU platforms through optimized vector update strategies. The algorithm incorporates modified and classical Gram–Schmidt methods with an algebraic multigrid (AMG) preconditioner, each tailored for GPU performance. A key innovation in our work is the development of a Gram–Schmidt projector [math] employing a rank-1 perturbation of the identity matrix. Designed to maximize the high memory bandwidth utilization of the AMD MI-250X GPU, this approach includes a strategy for treating the unit diagonal that minimizes memory reads, leading to a 25% increase in computational efficiency. The application of perturbation theory further ensures that orthogonality loss is limited to [math], where [math] is the number of iterations. Additionally, we introduce a mixed AMG [math]-cycle strategy combining ILU(0) and [math]-Jacobi smoothers, which achieves a 30–50% reduction in GPU compute times compared to conventional methods, while maintaining low backward error. This strategy, alongside our novel treatment of the diagonal in triangular matrices, marks a substantial increase in AMG efficicency for GPU systems. We believe that these contributions represent a significant advance in optimizing GMRES+AMG algorithms for GPU computations. The empirical results demonstrate notable speed increments and maintain rigorous backward error bounds, underscoring the potential of our methods to substantially increase computational efficiency in large-scale scientific applications.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient GMRES+AMG on GPUs: Composite Smoothers And Mixed [math]-Cycles\",\"authors\":\"Stephen Thomas, Allison H. Baker\",\"doi\":\"10.1137/23m1578632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SIAM Journal on Scientific Computing, Ahead of Print. <br/> Abstract. In this study, we introduce algorithms optimized for GPU architectures, aimed at efficiently solving large sparse linear systems, a central challenge in Navier–Stokes pressure projection problems. Our approach includes an adaptation of the GMRES algorithm, drawing inspiration from the merged vector operations first proposed by Bielich et al. [Parallel Comput., 112 (2022), 102940]. This adaptation increases computational intensity on GPU platforms through optimized vector update strategies. The algorithm incorporates modified and classical Gram–Schmidt methods with an algebraic multigrid (AMG) preconditioner, each tailored for GPU performance. A key innovation in our work is the development of a Gram–Schmidt projector [math] employing a rank-1 perturbation of the identity matrix. Designed to maximize the high memory bandwidth utilization of the AMD MI-250X GPU, this approach includes a strategy for treating the unit diagonal that minimizes memory reads, leading to a 25% increase in computational efficiency. The application of perturbation theory further ensures that orthogonality loss is limited to [math], where [math] is the number of iterations. Additionally, we introduce a mixed AMG [math]-cycle strategy combining ILU(0) and [math]-Jacobi smoothers, which achieves a 30–50% reduction in GPU compute times compared to conventional methods, while maintaining low backward error. This strategy, alongside our novel treatment of the diagonal in triangular matrices, marks a substantial increase in AMG efficicency for GPU systems. We believe that these contributions represent a significant advance in optimizing GMRES+AMG algorithms for GPU computations. The empirical results demonstrate notable speed increments and maintain rigorous backward error bounds, underscoring the potential of our methods to substantially increase computational efficiency in large-scale scientific applications.\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1137/23m1578632\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1137/23m1578632","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

SIAM 科学计算期刊》,提前印刷。 摘要在本研究中,我们介绍了针对 GPU 架构进行优化的算法,旨在高效求解大型稀疏线性系统,这是 Navier-Stokes 压力投影问题的核心挑战。我们的方法包括对 GMRES 算法的改编,从 Bielich 等人首次提出的合并矢量运算中汲取灵感[《并行计算》,112 (2022),102940]。这种调整通过优化向量更新策略提高了 GPU 平台的计算强度。该算法结合了修正的经典格兰-施密特方法和代数多网格(AMG)预处理器,每种方法都是为 GPU 性能量身定制的。我们工作中的一项关键创新是开发了一种格拉姆-施密特投影器[math],采用了秩-1扰动特征矩阵。这种方法旨在最大限度地利用 AMD MI-250X GPU 的高内存带宽,其中包括一种处理单元对角线的策略,可最大限度地减少内存读取,从而将计算效率提高 25%。扰动理论的应用进一步确保了正交损失仅限于 [math],其中 [math] 是迭代次数。此外,我们还引入了一种混合 AMG [math]循环策略,它结合了 ILU(0) 和 [math]-Jacobi 平滑器,与传统方法相比,GPU 计算时间减少了 30-50%,同时保持了较低的后向误差。这一策略以及我们对三角形矩阵对角线的新颖处理,标志着 GPU 系统 AMG 效率的大幅提升。我们相信,这些贡献代表了为 GPU 计算优化 GMRES+AMG 算法的重大进展。实证结果显示了显著的速度提升,并保持了严格的后向误差边界,突出了我们的方法在大规模科学应用中大幅提高计算效率的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient GMRES+AMG on GPUs: Composite Smoothers And Mixed [math]-Cycles
SIAM Journal on Scientific Computing, Ahead of Print.
Abstract. In this study, we introduce algorithms optimized for GPU architectures, aimed at efficiently solving large sparse linear systems, a central challenge in Navier–Stokes pressure projection problems. Our approach includes an adaptation of the GMRES algorithm, drawing inspiration from the merged vector operations first proposed by Bielich et al. [Parallel Comput., 112 (2022), 102940]. This adaptation increases computational intensity on GPU platforms through optimized vector update strategies. The algorithm incorporates modified and classical Gram–Schmidt methods with an algebraic multigrid (AMG) preconditioner, each tailored for GPU performance. A key innovation in our work is the development of a Gram–Schmidt projector [math] employing a rank-1 perturbation of the identity matrix. Designed to maximize the high memory bandwidth utilization of the AMD MI-250X GPU, this approach includes a strategy for treating the unit diagonal that minimizes memory reads, leading to a 25% increase in computational efficiency. The application of perturbation theory further ensures that orthogonality loss is limited to [math], where [math] is the number of iterations. Additionally, we introduce a mixed AMG [math]-cycle strategy combining ILU(0) and [math]-Jacobi smoothers, which achieves a 30–50% reduction in GPU compute times compared to conventional methods, while maintaining low backward error. This strategy, alongside our novel treatment of the diagonal in triangular matrices, marks a substantial increase in AMG efficicency for GPU systems. We believe that these contributions represent a significant advance in optimizing GMRES+AMG algorithms for GPU computations. The empirical results demonstrate notable speed increments and maintain rigorous backward error bounds, underscoring the potential of our methods to substantially increase computational efficiency in large-scale scientific applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Vitamin B12: prevention of human beings from lethal diseases and its food application. Current status and obstacles of narrowing yield gaps of four major crops. Cold shock treatment alleviates pitting in sweet cherry fruit by enhancing antioxidant enzymes activity and regulating membrane lipid metabolism. Removal of proteins and lipids affects structure, in vitro digestion and physicochemical properties of rice flour modified by heat-moisture treatment. Investigating the impact of climate variables on the organic honey yield in Turkey using XGBoost machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1