Jongsoo Park, M. Smelyanskiy, U. Yang, Dheevatsa Mudigere, P. Dubey
{"title":"针对多核分布式并行系统优化的高性能代数多网格求解器","authors":"Jongsoo Park, M. Smelyanskiy, U. Yang, Dheevatsa Mudigere, P. Dubey","doi":"10.1145/2807591.2807603","DOIUrl":null,"url":null,"abstract":"Algebraic Multigrid (AMG) is a linear solver, well known for its linear computational complexity and excellent parallelization scalability. As a result, AMG is expected to be a solver of choice for emerging extreme scale systems capable of delivering hundred Pflops and beyond. While node level performance of AMG is generally limited by memory bandwidth, achieving high bandwidth efficiency is challenging due to highly sparse irregular computation, such as triple sparse matrix products, sparse-matrix dense-vector multiplications, independent set coarsening algorithms, and smoothers such as Gauss-Seidel. We develop and analyze a highly optimized AMG implementation, based on the well-known HYPRE library. Compared to the HYPRE baseline implementation, our optimized implementation achieves 2.0x speedup on a recent Intel® Xeon® Haswell processor. Combined with our other multi-node optimizations, this translates into similarly high speedups when weak-scaled multiple nodes. In addition, our implementation achieves 1.3x speedup compared to AmgX, NVIDIA's high-performance implementation of AMG, running on K40c.","PeriodicalId":117494,"journal":{"name":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"34-35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems\",\"authors\":\"Jongsoo Park, M. Smelyanskiy, U. Yang, Dheevatsa Mudigere, P. Dubey\",\"doi\":\"10.1145/2807591.2807603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Algebraic Multigrid (AMG) is a linear solver, well known for its linear computational complexity and excellent parallelization scalability. As a result, AMG is expected to be a solver of choice for emerging extreme scale systems capable of delivering hundred Pflops and beyond. While node level performance of AMG is generally limited by memory bandwidth, achieving high bandwidth efficiency is challenging due to highly sparse irregular computation, such as triple sparse matrix products, sparse-matrix dense-vector multiplications, independent set coarsening algorithms, and smoothers such as Gauss-Seidel. We develop and analyze a highly optimized AMG implementation, based on the well-known HYPRE library. Compared to the HYPRE baseline implementation, our optimized implementation achieves 2.0x speedup on a recent Intel® Xeon® Haswell processor. Combined with our other multi-node optimizations, this translates into similarly high speedups when weak-scaled multiple nodes. In addition, our implementation achieves 1.3x speedup compared to AmgX, NVIDIA's high-performance implementation of AMG, running on K40c.\",\"PeriodicalId\":117494,\"journal\":{\"name\":\"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"volume\":\"34-35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2807591.2807603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2807591.2807603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems
Algebraic Multigrid (AMG) is a linear solver, well known for its linear computational complexity and excellent parallelization scalability. As a result, AMG is expected to be a solver of choice for emerging extreme scale systems capable of delivering hundred Pflops and beyond. While node level performance of AMG is generally limited by memory bandwidth, achieving high bandwidth efficiency is challenging due to highly sparse irregular computation, such as triple sparse matrix products, sparse-matrix dense-vector multiplications, independent set coarsening algorithms, and smoothers such as Gauss-Seidel. We develop and analyze a highly optimized AMG implementation, based on the well-known HYPRE library. Compared to the HYPRE baseline implementation, our optimized implementation achieves 2.0x speedup on a recent Intel® Xeon® Haswell processor. Combined with our other multi-node optimizations, this translates into similarly high speedups when weak-scaled multiple nodes. In addition, our implementation achieves 1.3x speedup compared to AmgX, NVIDIA's high-performance implementation of AMG, running on K40c.