Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines

Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi
{"title":"Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines","authors":"Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi","doi":"10.14529/JSFI190106","DOIUrl":null,"url":null,"abstract":"Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Supercomput. Front. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/JSFI190106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
迭代流求解器在现代向量机上不同实现方案的性能评价
现代超级计算机由多核处理器组成,这些处理器最近采用矢量指令或所谓的SIMD指令来提高性能。为了在这些处理器上实现更高的性能,需要对数值模拟进行矢量化。长期以来使用的各种遗留数值模拟代码通常包含两个版本的源代码:非矢量化版本和针对旧矢量超级计算机优化的矢量化版本。为了实现更高的性能,明确哪个版本更适合现代超级计算机是很重要的。在本文中,我们评估了一种称为FASTEST的传统流体动力学模拟代码在现代超级计算机上的性能,以便为将此类代码迁移到现代超级计算机提供指导。求解器有非矢量化版本和矢量化版本,矢量化版本采用超平面排序方法进行矢量化。对于求值,我们还实现了红黑排序法,这是求解器矢量化的另一种方法。然后,我们研究了NEC SX-ACE、SXAurora TSUBASA、Intel Xeon Gold和Xeon Phi处理器的性能。结果表明,在SX-ACE和SX-Aurora TSUBASA上使用红黑排序方法执行时间最短,而在Xeon Gold和Xeon Phi上使用非矢量化版本执行时间最短。因此,在多台现代超级计算机上实现更高的性能可能需要维护多个代码版本。我们还表明,红黑排序方法更有希望在现代超级计算机上实现高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Supercomputer-Based Modeling System for Short-Term Prediction of Urban Surface Air Quality River Routing in the INM RAS-MSU Land Surface Model: Numerical Scheme and Parallel Implementation on Hybrid Supercomputers Data Assimilation by Neural Network for Ocean Circulation: Parallel Implementation Multistage Iterative Method to Tackle Inverse Problems of Wave Tomography Machine Learning Approaches to Extreme Weather Events Forecast in Urban Areas: Challenges and Initial Results
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1