{"title":"Parallel Thomas approach development for solving tridiagonal systems in GPU programming − steady and unsteady flow simulation","authors":"M. Souri, P. Akbarzadeh, H. M. Darian","doi":"10.1051/meca/2020013","DOIUrl":null,"url":null,"abstract":"The solution of tridiagonal system of equations using graphic processing units (GPU) is assessed. The parallel-Thomas-algorithm (PTA) is developed and the solution of PTA is compared to two known parallel algorithms, i.e. cyclic-reduction (CR) and parallel-cyclic-reduction (PCR). Lid-driven cavity problem is considered to assess these parallel approaches. This problem is also simulated using the classic Thomas algorithm that runs on a central processing unit (CPU). Runtimes and physical parameters of the mentioned GPU and CPU algorithms are compared. The results show that the speedup of CR, PCR and PTA against the CPU runtime is 4.4x,5.2x and 38.5x, respectively. Furthermore, the effect of coalesced and uncoalesced memory access to GPU global memory is examined for PTA, and a 2x-speedup is achieved for the coalesced memory access. Additionally, the PTA performance in a time dependent problem, the unsteady flow over a square, is assessed and a 9x-speedup is obtained against the CPU.","PeriodicalId":49018,"journal":{"name":"Mechanics & Industry","volume":"62 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mechanics & Industry","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1051/meca/2020013","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}
引用次数: 3
Abstract
The solution of tridiagonal system of equations using graphic processing units (GPU) is assessed. The parallel-Thomas-algorithm (PTA) is developed and the solution of PTA is compared to two known parallel algorithms, i.e. cyclic-reduction (CR) and parallel-cyclic-reduction (PCR). Lid-driven cavity problem is considered to assess these parallel approaches. This problem is also simulated using the classic Thomas algorithm that runs on a central processing unit (CPU). Runtimes and physical parameters of the mentioned GPU and CPU algorithms are compared. The results show that the speedup of CR, PCR and PTA against the CPU runtime is 4.4x,5.2x and 38.5x, respectively. Furthermore, the effect of coalesced and uncoalesced memory access to GPU global memory is examined for PTA, and a 2x-speedup is achieved for the coalesced memory access. Additionally, the PTA performance in a time dependent problem, the unsteady flow over a square, is assessed and a 9x-speedup is obtained against the CPU.
期刊介绍:
An International Journal on Mechanical Sciences and Engineering Applications
With papers from industry, Research and Development departments and academic institutions, this journal acts as an interface between research and industry, coordinating and disseminating scientific and technical mechanical research in relation to industrial activities.
Targeted readers are technicians, engineers, executives, researchers, and teachers who are working in industrial companies as managers or in Research and Development departments, technical centres, laboratories, universities, technical and engineering schools. The journal is an AFM (Association Française de Mécanique) publication.