{"title":"针对不可压缩流动的有限体积晶格玻尔兹曼方法的 GPU 并行执行","authors":"","doi":"10.1016/j.compfluid.2024.106460","DOIUrl":null,"url":null,"abstract":"<div><div>This work presents a graphics processing units (GPU) parallel algorithm of a cell-centered finite volume lattice Boltzmann method (FVLBM) on unstructured meshes. In the present GPU parallel algorithm, the parallelization is performed in the physical space. To reduce the frequency of GPU memory accesses, this algorithm develops coalesced access to GPU memory. In addition, to avoid the race for resources leading to data anomalies, such as dirty read or phantom read <em>etc.</em>, and the double counting for flux calculation, the efficient face-based data structure often used for flux calculation in cells in the central processing unit (CPU) version of FVLBM is modified into a face-based data structure used for the fluxes on all faces, followed by a cell-based loop for the final residuals in all cells. Therefore, the proposed GPU parallel algorithm does not need to use the resource lock and retains the high efficiency of the face-based data structure in the fluxes computation to enhance its’ parallel efficiency. Additionally, to demonstrate the computational efficiency of the proposed GPU parallel algorithm, various benchmark studies are performed in this work by the proposed parallel scheme on a double precision NVIDIA GeForce RTX 3090Ti GPU card, including (a) the lid-driven flow in a two-dimensional (2D) square cavity, (b) a 2D flow past a cylinder, and (c) the lid-driven flow in a three-dimensional (3D) cubic cavity. The numerical results show that the proposed GPU parallel algorithm can be as accurate as the original CPU serial scheme with 1 to 2 orders of speedup.</div></div>","PeriodicalId":287,"journal":{"name":"Computers & Fluids","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPU parallel implementation of a finite volume lattice Boltzmann method for incompressible flows\",\"authors\":\"\",\"doi\":\"10.1016/j.compfluid.2024.106460\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This work presents a graphics processing units (GPU) parallel algorithm of a cell-centered finite volume lattice Boltzmann method (FVLBM) on unstructured meshes. In the present GPU parallel algorithm, the parallelization is performed in the physical space. To reduce the frequency of GPU memory accesses, this algorithm develops coalesced access to GPU memory. In addition, to avoid the race for resources leading to data anomalies, such as dirty read or phantom read <em>etc.</em>, and the double counting for flux calculation, the efficient face-based data structure often used for flux calculation in cells in the central processing unit (CPU) version of FVLBM is modified into a face-based data structure used for the fluxes on all faces, followed by a cell-based loop for the final residuals in all cells. Therefore, the proposed GPU parallel algorithm does not need to use the resource lock and retains the high efficiency of the face-based data structure in the fluxes computation to enhance its’ parallel efficiency. Additionally, to demonstrate the computational efficiency of the proposed GPU parallel algorithm, various benchmark studies are performed in this work by the proposed parallel scheme on a double precision NVIDIA GeForce RTX 3090Ti GPU card, including (a) the lid-driven flow in a two-dimensional (2D) square cavity, (b) a 2D flow past a cylinder, and (c) the lid-driven flow in a three-dimensional (3D) cubic cavity. The numerical results show that the proposed GPU parallel algorithm can be as accurate as the original CPU serial scheme with 1 to 2 orders of speedup.</div></div>\",\"PeriodicalId\":287,\"journal\":{\"name\":\"Computers & Fluids\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Fluids\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045793024002913\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Fluids","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045793024002913","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
GPU parallel implementation of a finite volume lattice Boltzmann method for incompressible flows
This work presents a graphics processing units (GPU) parallel algorithm of a cell-centered finite volume lattice Boltzmann method (FVLBM) on unstructured meshes. In the present GPU parallel algorithm, the parallelization is performed in the physical space. To reduce the frequency of GPU memory accesses, this algorithm develops coalesced access to GPU memory. In addition, to avoid the race for resources leading to data anomalies, such as dirty read or phantom read etc., and the double counting for flux calculation, the efficient face-based data structure often used for flux calculation in cells in the central processing unit (CPU) version of FVLBM is modified into a face-based data structure used for the fluxes on all faces, followed by a cell-based loop for the final residuals in all cells. Therefore, the proposed GPU parallel algorithm does not need to use the resource lock and retains the high efficiency of the face-based data structure in the fluxes computation to enhance its’ parallel efficiency. Additionally, to demonstrate the computational efficiency of the proposed GPU parallel algorithm, various benchmark studies are performed in this work by the proposed parallel scheme on a double precision NVIDIA GeForce RTX 3090Ti GPU card, including (a) the lid-driven flow in a two-dimensional (2D) square cavity, (b) a 2D flow past a cylinder, and (c) the lid-driven flow in a three-dimensional (3D) cubic cavity. The numerical results show that the proposed GPU parallel algorithm can be as accurate as the original CPU serial scheme with 1 to 2 orders of speedup.
期刊介绍:
Computers & Fluids is multidisciplinary. The term ''fluid'' is interpreted in the broadest sense. Hydro- and aerodynamics, high-speed and physical gas dynamics, turbulence and flow stability, multiphase flow, rheology, tribology and fluid-structure interaction are all of interest, provided that computer technique plays a significant role in the associated studies or design methodology.