Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters

The International Conference on High Performance Computing in Asia-Pacific Region Companion Pub Date : 2021-01-20 DOI:10.1145/3440722.3440882

K. Nakajima, Balazs Gerofi, Y. Ishikawa, Masashi Horikoshi

{"title":"Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters","authors":"K. Nakajima, Balazs Gerofi, Y. Ishikawa, Masashi Horikoshi","doi":"10.1145/3440722.3440882","DOIUrl":null,"url":null,"abstract":"The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-FVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ to the MGCG solver, and evaluated the performance of the solver with various types of OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OFP) system at JCAHPC using up to 1,024 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 20%. This is one of the first examples of SELL-C-σ applied to forward/backward substitutions in ILU-type smoother of multigrid solver. Furthermore, effects of IHK/McKernel has been investigated, and it achieved 11% improvement on 1,024 nodes.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440722.3440882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-FVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ to the MGCG solver, and evaluated the performance of the solver with various types of OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OFP) system at JCAHPC using up to 1,024 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 20%. This is one of the first examples of SELL-C-σ applied to forward/backward substitutions in ILU-type smoother of multigrid solver. Furthermore, effects of IHK/McKernel has been investigated, and it achieved 11% improvement on 1,024 nodes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Intel Xeon Phi集群的高效并行多网格方法

并行多重网格法有望在超大规模超级计算机系统的科学计算中发挥重要作用，用于求解具有稀疏矩阵的大规模线性方程组。由于求解稀疏线性系统是一个非常受内存限制的过程，因此系数矩阵的有效存储方法是一个关键问题。在之前的工作中，作者将切片ELL方法与多网格预处理(MGCG)并行共轭梯度求解器应用于非均质多孔介质三维地下水流动(pGW3D-FVM)，并在大规模多核/多核集群上取得了优异的性能。在本文中，作者将SELL-C-σ引入到MGCG求解器中，并在JCAHPC的Oakforest-PACS (OFP)系统上使用多达1,024个Intel Xeon Phi节点，使用不同类型的OpenMP/MPI混合并行编程模型对求解器的性能进行了评估。由于SELL-C-σ适用于Xeon Phi等宽simd架构，因此性能比切片ELL提高了20%以上。这是将SELL-C-σ应用于多网格求解器ilu型平滑中的前向/后向替换的第一个例子。此外，对IHK/McKernel的效果进行了研究，在1,024个节点上实现了11%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The International Conference on High Performance Computing in Asia-Pacific Region Companion

自引率

0.00%

发文量