Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, J. Dongarra
{"title":"Improving the Performance of CA-GMRES on Multicores with Multiple GPUs","authors":"I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, J. Dongarra","doi":"10.1109/IPDPS.2014.48","DOIUrl":null,"url":null,"abstract":"The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming crucial component in computing, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix powers kernel on multiple GPUs, we particularly focus on orthogonalization strategies that have a great impact on both the numerical stability and performance of GMRES, especially as the matrix becomes sparser or ill-conditioned. We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, either on a GPU or between the GPUs. As part of our study, we investigate several optimization techniques for the GPU kernels that can also be used in other iterative solvers besides GMRES. Hence, our studies not only emphasize the importance of avoiding communication on GPUs, but they also provide insight about the effects of these optimization techniques on the performance of the sparse solvers, and may have greater impact beyond GMRES.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"47 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2014.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 50

Abstract

The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming crucial component in computing, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix powers kernel on multiple GPUs, we particularly focus on orthogonalization strategies that have a great impact on both the numerical stability and performance of GMRES, especially as the matrix becomes sparser or ill-conditioned. We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, either on a GPU or between the GPUs. As part of our study, we investigate several optimization techniques for the GPU kernels that can also be used in other iterative solvers besides GMRES. Hence, our studies not only emphasize the importance of avoiding communication on GPUs, but they also provide insight about the effects of these optimization techniques on the performance of the sparse solvers, and may have greater impact beyond GMRES.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
提高CA-GMRES在多核多gpu上的性能
广义最小残差法(GMRES)是求解非对称线性方程组最常用的迭代方法之一。近年来,GMRES中避免通信的技术引起了人们的关注,因为与浮点运算相比,通信在现代计算机上变得越来越昂贵。由于图形处理单元(gpu)现在成为计算中的关键组件,我们研究了这些技术在具有多个gpu的多核cpu上的有效性。虽然我们在多个gpu上详细介绍了矩阵幂核的性能研究,但我们特别关注对GMRES的数值稳定性和性能有很大影响的正交化策略,特别是当矩阵变得稀疏或病态时。我们展示了两个八核英特尔Sandy Bridge cpu和三个NDIVIA Fermi GPU的实验结果,并证明通过避免GPU上或GPU之间的通信可以获得显着的速度提升。作为我们研究的一部分,我们研究了几种GPU内核的优化技术,这些技术也可以用于除GMRES之外的其他迭代求解器。因此,我们的研究不仅强调了在gpu上避免通信的重要性,而且还提供了这些优化技术对稀疏求解器性能的影响的见解,并且可能具有比GMRES更大的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs Multi-resource Real-Time Reader/Writer Locks for Multiprocessors Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore Systems Scaling Irregular Applications through Data Aggregation and Software Multithreading Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1