Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines

Supercomput. Front. Innov. Pub Date : 2019-03-01 DOI:10.14529/JSFI190106

Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi

{"title":"Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines","authors":"Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi","doi":"10.14529/JSFI190106","DOIUrl":null,"url":null,"abstract":"Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Supercomput. Front. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/JSFI190106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

迭代流求解器在现代向量机上不同实现方案的性能评价

现代超级计算机由多核处理器组成，这些处理器最近采用矢量指令或所谓的SIMD指令来提高性能。为了在这些处理器上实现更高的性能，需要对数值模拟进行矢量化。长期以来使用的各种遗留数值模拟代码通常包含两个版本的源代码:非矢量化版本和针对旧矢量超级计算机优化的矢量化版本。为了实现更高的性能，明确哪个版本更适合现代超级计算机是很重要的。在本文中，我们评估了一种称为FASTEST的传统流体动力学模拟代码在现代超级计算机上的性能，以便为将此类代码迁移到现代超级计算机提供指导。求解器有非矢量化版本和矢量化版本，矢量化版本采用超平面排序方法进行矢量化。对于求值，我们还实现了红黑排序法，这是求解器矢量化的另一种方法。然后，我们研究了NEC SX-ACE、SXAurora TSUBASA、Intel Xeon Gold和Xeon Phi处理器的性能。结果表明，在SX-ACE和SX-Aurora TSUBASA上使用红黑排序方法执行时间最短，而在Xeon Gold和Xeon Phi上使用非矢量化版本执行时间最短。因此，在多台现代超级计算机上实现更高的性能可能需要维护多个代码版本。我们还表明，红黑排序方法更有希望在现代超级计算机上实现高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Supercomput. Front. Innov.

自引率

0.00%

发文量