Multiple-GPU accelerated high-order gas-kinetic scheme on three-dimensional unstructured meshes

IF 7.2 2区 物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Physics Communications Pub Date : 2025-01-22 DOI:10.1016/j.cpc.2025.109513
Yuhang Wang, Waixiang Cao, Liang Pan
{"title":"Multiple-GPU accelerated high-order gas-kinetic scheme on three-dimensional unstructured meshes","authors":"Yuhang Wang,&nbsp;Waixiang Cao,&nbsp;Liang Pan","doi":"10.1016/j.cpc.2025.109513","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, successes have been achieved for the high-order gas-kinetic schemes (HGKS) on unstructured meshes for compressible flows. In this paper, to accelerate the computation, HGKS is implemented with the graphical processing unit (GPU) using the compute unified device architecture (CUDA). HGKS on unstructured meshes is a fully explicit scheme, and the acceleration framework can be developed based on the cell-level parallelism. For single-GPU computation, the connectivity of geometric information is generated for the requirement of data localization and independence. Based on such data structure, the kernels and corresponding girds of CUDA are set. With the one-to-one mapping between the indices of cells and CUDA threads, the single-GPU computation using CUDA can be implemented for HGKS. For multiple-GPU computation, the domain decomposition and data exchange need to be taken into account. The domain is decomposed into subdomains by METIS, and the MPI processes are created for the control of each process and communication among GPUs. With reconstruction of connectivity and adding ghost cells, the main configuration of CUDA for single-GPU can be inherited by each GPU. The benchmark cases for compressible flows, including accuracy test and flow passing through a sphere, are presented to assess the numerical performance of HGKS with Nvidia RTX A5000 and Tesla V100 GPUs. For single-GPU computation, compared with the parallel central processing unit (CPU) code running on the Intel Xeon Gold 5120 CPU with open multi-processing (OpenMP) directives, 5x speedup is achieved by RTX A5000 and 9x speedup is achieved by Tesla V100. For multiple-GPU computation, HGKS code scales properly with the increasing number of GPU. Numerical results confirm the excellent performance of multiple-GPU accelerated HGKS on unstructured meshes.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"310 ","pages":"Article 109513"},"PeriodicalIF":7.2000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525000165","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, successes have been achieved for the high-order gas-kinetic schemes (HGKS) on unstructured meshes for compressible flows. In this paper, to accelerate the computation, HGKS is implemented with the graphical processing unit (GPU) using the compute unified device architecture (CUDA). HGKS on unstructured meshes is a fully explicit scheme, and the acceleration framework can be developed based on the cell-level parallelism. For single-GPU computation, the connectivity of geometric information is generated for the requirement of data localization and independence. Based on such data structure, the kernels and corresponding girds of CUDA are set. With the one-to-one mapping between the indices of cells and CUDA threads, the single-GPU computation using CUDA can be implemented for HGKS. For multiple-GPU computation, the domain decomposition and data exchange need to be taken into account. The domain is decomposed into subdomains by METIS, and the MPI processes are created for the control of each process and communication among GPUs. With reconstruction of connectivity and adding ghost cells, the main configuration of CUDA for single-GPU can be inherited by each GPU. The benchmark cases for compressible flows, including accuracy test and flow passing through a sphere, are presented to assess the numerical performance of HGKS with Nvidia RTX A5000 and Tesla V100 GPUs. For single-GPU computation, compared with the parallel central processing unit (CPU) code running on the Intel Xeon Gold 5120 CPU with open multi-processing (OpenMP) directives, 5x speedup is achieved by RTX A5000 and 9x speedup is achieved by Tesla V100. For multiple-GPU computation, HGKS code scales properly with the increasing number of GPU. Numerical results confirm the excellent performance of multiple-GPU accelerated HGKS on unstructured meshes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Physics Communications
Computer Physics Communications 物理-计算机:跨学科应用
CiteScore
12.10
自引率
3.20%
发文量
287
审稿时长
5.3 months
期刊介绍: The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.
期刊最新文献
Galactic distribution of supernovae and OB associations ToMSGKpoint: A user-friendly package for computing symmetry transformation properties of electronic eigenstates of nonmagnetic and magnetic crystalline materials curvedSpaceSim: A framework for simulating particles interacting along geodesics JAX-based aeroelastic simulation engine for differentiable aircraft dynamics CaLES: A GPU-accelerated solver for large-eddy simulation of wall-bounded flows
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1