Jongsoo Park, M. Smelyanskiy, U. Yang, Dheevatsa Mudigere, P. Dubey
{"title":"High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems","authors":"Jongsoo Park, M. Smelyanskiy, U. Yang, Dheevatsa Mudigere, P. Dubey","doi":"10.1145/2807591.2807603","DOIUrl":null,"url":null,"abstract":"Algebraic Multigrid (AMG) is a linear solver, well known for its linear computational complexity and excellent parallelization scalability. As a result, AMG is expected to be a solver of choice for emerging extreme scale systems capable of delivering hundred Pflops and beyond. While node level performance of AMG is generally limited by memory bandwidth, achieving high bandwidth efficiency is challenging due to highly sparse irregular computation, such as triple sparse matrix products, sparse-matrix dense-vector multiplications, independent set coarsening algorithms, and smoothers such as Gauss-Seidel. We develop and analyze a highly optimized AMG implementation, based on the well-known HYPRE library. Compared to the HYPRE baseline implementation, our optimized implementation achieves 2.0x speedup on a recent Intel® Xeon® Haswell processor. Combined with our other multi-node optimizations, this translates into similarly high speedups when weak-scaled multiple nodes. In addition, our implementation achieves 1.3x speedup compared to AmgX, NVIDIA's high-performance implementation of AMG, running on K40c.","PeriodicalId":117494,"journal":{"name":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"34-35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2807591.2807603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Algebraic Multigrid (AMG) is a linear solver, well known for its linear computational complexity and excellent parallelization scalability. As a result, AMG is expected to be a solver of choice for emerging extreme scale systems capable of delivering hundred Pflops and beyond. While node level performance of AMG is generally limited by memory bandwidth, achieving high bandwidth efficiency is challenging due to highly sparse irregular computation, such as triple sparse matrix products, sparse-matrix dense-vector multiplications, independent set coarsening algorithms, and smoothers such as Gauss-Seidel. We develop and analyze a highly optimized AMG implementation, based on the well-known HYPRE library. Compared to the HYPRE baseline implementation, our optimized implementation achieves 2.0x speedup on a recent Intel® Xeon® Haswell processor. Combined with our other multi-node optimizations, this translates into similarly high speedups when weak-scaled multiple nodes. In addition, our implementation achieves 1.3x speedup compared to AmgX, NVIDIA's high-performance implementation of AMG, running on K40c.