Multiple-GPU accelerated high-order gas-kinetic scheme on three-dimensional unstructured meshes

IF 3.4 2区 物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Physics Communications Pub Date : 2025-05-01 Epub Date: 2025-01-22 DOI:10.1016/j.cpc.2025.109513
Yuhang Wang, Waixiang Cao, Liang Pan
{"title":"Multiple-GPU accelerated high-order gas-kinetic scheme on three-dimensional unstructured meshes","authors":"Yuhang Wang,&nbsp;Waixiang Cao,&nbsp;Liang Pan","doi":"10.1016/j.cpc.2025.109513","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, successes have been achieved for the high-order gas-kinetic schemes (HGKS) on unstructured meshes for compressible flows. In this paper, to accelerate the computation, HGKS is implemented with the graphical processing unit (GPU) using the compute unified device architecture (CUDA). HGKS on unstructured meshes is a fully explicit scheme, and the acceleration framework can be developed based on the cell-level parallelism. For single-GPU computation, the connectivity of geometric information is generated for the requirement of data localization and independence. Based on such data structure, the kernels and corresponding girds of CUDA are set. With the one-to-one mapping between the indices of cells and CUDA threads, the single-GPU computation using CUDA can be implemented for HGKS. For multiple-GPU computation, the domain decomposition and data exchange need to be taken into account. The domain is decomposed into subdomains by METIS, and the MPI processes are created for the control of each process and communication among GPUs. With reconstruction of connectivity and adding ghost cells, the main configuration of CUDA for single-GPU can be inherited by each GPU. The benchmark cases for compressible flows, including accuracy test and flow passing through a sphere, are presented to assess the numerical performance of HGKS with Nvidia RTX A5000 and Tesla V100 GPUs. For single-GPU computation, compared with the parallel central processing unit (CPU) code running on the Intel Xeon Gold 5120 CPU with open multi-processing (OpenMP) directives, 5x speedup is achieved by RTX A5000 and 9x speedup is achieved by Tesla V100. For multiple-GPU computation, HGKS code scales properly with the increasing number of GPU. Numerical results confirm the excellent performance of multiple-GPU accelerated HGKS on unstructured meshes.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"310 ","pages":"Article 109513"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525000165","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, successes have been achieved for the high-order gas-kinetic schemes (HGKS) on unstructured meshes for compressible flows. In this paper, to accelerate the computation, HGKS is implemented with the graphical processing unit (GPU) using the compute unified device architecture (CUDA). HGKS on unstructured meshes is a fully explicit scheme, and the acceleration framework can be developed based on the cell-level parallelism. For single-GPU computation, the connectivity of geometric information is generated for the requirement of data localization and independence. Based on such data structure, the kernels and corresponding girds of CUDA are set. With the one-to-one mapping between the indices of cells and CUDA threads, the single-GPU computation using CUDA can be implemented for HGKS. For multiple-GPU computation, the domain decomposition and data exchange need to be taken into account. The domain is decomposed into subdomains by METIS, and the MPI processes are created for the control of each process and communication among GPUs. With reconstruction of connectivity and adding ghost cells, the main configuration of CUDA for single-GPU can be inherited by each GPU. The benchmark cases for compressible flows, including accuracy test and flow passing through a sphere, are presented to assess the numerical performance of HGKS with Nvidia RTX A5000 and Tesla V100 GPUs. For single-GPU computation, compared with the parallel central processing unit (CPU) code running on the Intel Xeon Gold 5120 CPU with open multi-processing (OpenMP) directives, 5x speedup is achieved by RTX A5000 and 9x speedup is achieved by Tesla V100. For multiple-GPU computation, HGKS code scales properly with the increasing number of GPU. Numerical results confirm the excellent performance of multiple-GPU accelerated HGKS on unstructured meshes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
三维非结构化网格的多gpu加速高阶气体动力学格式
近年来,可压缩流动的非结构化网格高阶气体动力学格式(HGKS)取得了成功。为了加快计算速度,本文采用CUDA计算统一设备架构(CUDA)的图形处理单元(GPU)实现了HGKS。非结构化网格上的HGKS是一种完全显式的方案,可以基于单元级并行性开发加速框架。对于单gpu计算,为了满足数据定位和独立性的要求,生成几何信息的连通性。基于这种数据结构,设置了CUDA的核和相应的网格。利用单元格索引与CUDA线程之间的一对一映射,可以实现HGKS的CUDA单gpu计算。对于多gpu计算,需要考虑域分解和数据交换。通过METIS将域分解为子域,并创建MPI进程来控制每个进程和gpu之间的通信。通过重建连通性和添加鬼细胞,单个GPU的CUDA主配置可以被每个GPU继承。采用Nvidia RTX A5000和Tesla V100 gpu对可压缩流进行了精度测试和通过球体流的基准测试,以评估HGKS的数值性能。对于单gpu计算,与运行在Intel至强Gold 5120 CPU上具有开放多处理(OpenMP)指令的并行中央处理单元(CPU)代码相比,RTX A5000实现了5倍的加速,Tesla V100实现了9倍的加速。对于多GPU计算,HGKS代码随着GPU数量的增加而适当扩展。数值结果证实了多gpu加速HGKS在非结构化网格上的优异性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Physics Communications
Computer Physics Communications 物理-计算机:跨学科应用
CiteScore
12.10
自引率
3.20%
发文量
287
审稿时长
5.3 months
期刊介绍: The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.
期刊最新文献
Transformation techniques for weakly and nearly singular integrals: Comparative performance and numerical implementation Gas-kinetic scheme for compressible gas-particle system with coarse grained discrete element method MQCT 2026: a program for calculations of inelastic scattering of two molecules (new version announcement) GPU-accelerated PICNIC and S-PICNIC methods for solving space charge effects in beam dynamics simulations T3FF: Toroidal 3-dimensional full-f full-k code for isothermal gyrofluid drift-Alfvén turbulence
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1