Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters

K. Nakajima, Balazs Gerofi, Y. Ishikawa, Masashi Horikoshi
{"title":"Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters","authors":"K. Nakajima, Balazs Gerofi, Y. Ishikawa, Masashi Horikoshi","doi":"10.1145/3440722.3440882","DOIUrl":null,"url":null,"abstract":"The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-FVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ to the MGCG solver, and evaluated the performance of the solver with various types of OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OFP) system at JCAHPC using up to 1,024 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 20%. This is one of the first examples of SELL-C-σ applied to forward/backward substitutions in ILU-type smoother of multigrid solver. Furthermore, effects of IHK/McKernel has been investigated, and it achieved 11% improvement on 1,024 nodes.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440722.3440882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-FVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ to the MGCG solver, and evaluated the performance of the solver with various types of OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OFP) system at JCAHPC using up to 1,024 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 20%. This is one of the first examples of SELL-C-σ applied to forward/backward substitutions in ILU-type smoother of multigrid solver. Furthermore, effects of IHK/McKernel has been investigated, and it achieved 11% improvement on 1,024 nodes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Intel Xeon Phi集群的高效并行多网格方法
并行多重网格法有望在超大规模超级计算机系统的科学计算中发挥重要作用,用于求解具有稀疏矩阵的大规模线性方程组。由于求解稀疏线性系统是一个非常受内存限制的过程,因此系数矩阵的有效存储方法是一个关键问题。在之前的工作中,作者将切片ELL方法与多网格预处理(MGCG)并行共轭梯度求解器应用于非均质多孔介质三维地下水流动(pGW3D-FVM),并在大规模多核/多核集群上取得了优异的性能。在本文中,作者将SELL-C-σ引入到MGCG求解器中,并在JCAHPC的Oakforest-PACS (OFP)系统上使用多达1,024个Intel Xeon Phi节点,使用不同类型的OpenMP/MPI混合并行编程模型对求解器的性能进行了评估。由于SELL-C-σ适用于Xeon Phi等宽simd架构,因此性能比切片ELL提高了20%以上。这是将SELL-C-σ应用于多网格求解器ilu型平滑中的前向/后向替换的第一个例子。此外,对IHK/McKernel的效果进行了研究,在1,024个节点上实现了11%的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-scale Modelling of Urban Air Pollution with Coupled Weather Forecast and Traffic Simulation on HPC Architecture Node-level Performance Optimizations in CFD Codes A Comparison of Parallel Profiling Tools for Programs utilizing the FFT An efficient halo approach for Euler-Lagrange simulations based on MPI-3 shared memory Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1