Xiangcheng Sun , Keichi Takahashi , Yoichi Shimomura , Hiroyuki Takizawa , Xian Wang
{"title":"在 SX-Aurora TSUBASA 矢量发动机上进行流体动力学 LBM 仿真的性能评估","authors":"Xiangcheng Sun , Keichi Takahashi , Yoichi Shimomura , Hiroyuki Takizawa , Xian Wang","doi":"10.1016/j.cpc.2024.109411","DOIUrl":null,"url":null,"abstract":"<div><div>Currently, the lattice Boltzmann method (LBM) with high-performance computing (HPC) technologies, such as graphics processing units (GPUs), has been widely adopted to solve various complex problems in fluid dynamics. In addition to GPUs, the vector engine (VE) developed by NEC Corporation has also emerged as an effective solution for memory-intensive numerical simulations such as LBM. Consequently, it is imperative to evaluate the performance of LBM simulations accelerated by VE. This study discusses our self-developed LBM code for both classical and fused implementations on the VE. Through numerical simulations of 2D and 3D lid-driven cavity flows, the performance of the brand-new VE Type 30A (VE30) in conducting large-scale grid is evaluated and analyzed, and a comparison is made against the results obtained with VE Type 20B (VE20), NVIDIA A100 GPU (A100) and H100 GPU (H100). The results indicate that, regardless of the LBM implementation, H100 achieves the highest performance. Furthermore, owing to the substantial enhancements in VE30's memory hierarchy, the performance of the streaming kernel in the classical implementation of LBM has been significantly improved compared to VE20 and A100, approaching that of H100. However, due to the characteristic of fused implementation requiring fewer memory accesses, the performance of VE30 is inferior to that of H100 in the fused implementation. Additionally, it is anticipated that, under specific physical issues and requirements, VE30 will exhibit evident performance potential in LBM simulations with large-scale grid sizes.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109411"},"PeriodicalIF":7.2000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance evaluation of the LBM simulations in fluid dynamics on SX-Aurora TSUBASA vector engine\",\"authors\":\"Xiangcheng Sun , Keichi Takahashi , Yoichi Shimomura , Hiroyuki Takizawa , Xian Wang\",\"doi\":\"10.1016/j.cpc.2024.109411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Currently, the lattice Boltzmann method (LBM) with high-performance computing (HPC) technologies, such as graphics processing units (GPUs), has been widely adopted to solve various complex problems in fluid dynamics. In addition to GPUs, the vector engine (VE) developed by NEC Corporation has also emerged as an effective solution for memory-intensive numerical simulations such as LBM. Consequently, it is imperative to evaluate the performance of LBM simulations accelerated by VE. This study discusses our self-developed LBM code for both classical and fused implementations on the VE. Through numerical simulations of 2D and 3D lid-driven cavity flows, the performance of the brand-new VE Type 30A (VE30) in conducting large-scale grid is evaluated and analyzed, and a comparison is made against the results obtained with VE Type 20B (VE20), NVIDIA A100 GPU (A100) and H100 GPU (H100). The results indicate that, regardless of the LBM implementation, H100 achieves the highest performance. Furthermore, owing to the substantial enhancements in VE30's memory hierarchy, the performance of the streaming kernel in the classical implementation of LBM has been significantly improved compared to VE20 and A100, approaching that of H100. However, due to the characteristic of fused implementation requiring fewer memory accesses, the performance of VE30 is inferior to that of H100 in the fused implementation. Additionally, it is anticipated that, under specific physical issues and requirements, VE30 will exhibit evident performance potential in LBM simulations with large-scale grid sizes.</div></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":\"307 \",\"pages\":\"Article 109411\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010465524003345\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465524003345","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Performance evaluation of the LBM simulations in fluid dynamics on SX-Aurora TSUBASA vector engine
Currently, the lattice Boltzmann method (LBM) with high-performance computing (HPC) technologies, such as graphics processing units (GPUs), has been widely adopted to solve various complex problems in fluid dynamics. In addition to GPUs, the vector engine (VE) developed by NEC Corporation has also emerged as an effective solution for memory-intensive numerical simulations such as LBM. Consequently, it is imperative to evaluate the performance of LBM simulations accelerated by VE. This study discusses our self-developed LBM code for both classical and fused implementations on the VE. Through numerical simulations of 2D and 3D lid-driven cavity flows, the performance of the brand-new VE Type 30A (VE30) in conducting large-scale grid is evaluated and analyzed, and a comparison is made against the results obtained with VE Type 20B (VE20), NVIDIA A100 GPU (A100) and H100 GPU (H100). The results indicate that, regardless of the LBM implementation, H100 achieves the highest performance. Furthermore, owing to the substantial enhancements in VE30's memory hierarchy, the performance of the streaming kernel in the classical implementation of LBM has been significantly improved compared to VE20 and A100, approaching that of H100. However, due to the characteristic of fused implementation requiring fewer memory accesses, the performance of VE30 is inferior to that of H100 in the fused implementation. Additionally, it is anticipated that, under specific physical issues and requirements, VE30 will exhibit evident performance potential in LBM simulations with large-scale grid sizes.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.