{"title":"三对角矩阵算法[TDMA]在多核架构上的性能优化:计算框架和数学建模","authors":"Anishchandran Chathalingath, A. Manoharan","doi":"10.4018/ijghpc.2019100101","DOIUrl":null,"url":null,"abstract":"Fast and efficient tridiagonal solvers are highly appreciated in scientific and engineering domain, but challenging optimization task for computer engineers. The state-of-the-art developments in multi-core computing paves the way to meet this challenge to an extent. The technical advances in multi-core computing provide opportunities to exploit lower levels of parallelism and concurrency for inherently sequential algorithms. In this article, the authors present an optimal performance pipelined parallel variant of the conventional Tridiagonal Matrix Algorithm (TDMA), aka the Thomas algorithm, on a multi-core CPU platform. The implementation, analysis and performance comparison of the proposed pipelined parallel TDMA and the conventional version are performed on an Intel SIMD multi-core architecture. The results are compared in terms of elapsed time, speedup, cache miss rate. For a system of ‘n' linear equations where n = 2^36 in presented pipelined parallel TDMA achieves speedup of 1.294X with a parallel efficiency of 43% initially and inclines towards linear speed up as the system grows.","PeriodicalId":43565,"journal":{"name":"International Journal of Grid and High Performance Computing","volume":"32 1","pages":"1-12"},"PeriodicalIF":0.6000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Optimization of Tridiagonal Matrix Algorithm [TDMA] on Multicore Architectures: Computational Framework and Mathematical Modelling\",\"authors\":\"Anishchandran Chathalingath, A. Manoharan\",\"doi\":\"10.4018/ijghpc.2019100101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fast and efficient tridiagonal solvers are highly appreciated in scientific and engineering domain, but challenging optimization task for computer engineers. The state-of-the-art developments in multi-core computing paves the way to meet this challenge to an extent. The technical advances in multi-core computing provide opportunities to exploit lower levels of parallelism and concurrency for inherently sequential algorithms. In this article, the authors present an optimal performance pipelined parallel variant of the conventional Tridiagonal Matrix Algorithm (TDMA), aka the Thomas algorithm, on a multi-core CPU platform. The implementation, analysis and performance comparison of the proposed pipelined parallel TDMA and the conventional version are performed on an Intel SIMD multi-core architecture. The results are compared in terms of elapsed time, speedup, cache miss rate. For a system of ‘n' linear equations where n = 2^36 in presented pipelined parallel TDMA achieves speedup of 1.294X with a parallel efficiency of 43% initially and inclines towards linear speed up as the system grows.\",\"PeriodicalId\":43565,\"journal\":{\"name\":\"International Journal of Grid and High Performance Computing\",\"volume\":\"32 1\",\"pages\":\"1-12\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Grid and High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijghpc.2019100101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Grid and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijghpc.2019100101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Performance Optimization of Tridiagonal Matrix Algorithm [TDMA] on Multicore Architectures: Computational Framework and Mathematical Modelling
Fast and efficient tridiagonal solvers are highly appreciated in scientific and engineering domain, but challenging optimization task for computer engineers. The state-of-the-art developments in multi-core computing paves the way to meet this challenge to an extent. The technical advances in multi-core computing provide opportunities to exploit lower levels of parallelism and concurrency for inherently sequential algorithms. In this article, the authors present an optimal performance pipelined parallel variant of the conventional Tridiagonal Matrix Algorithm (TDMA), aka the Thomas algorithm, on a multi-core CPU platform. The implementation, analysis and performance comparison of the proposed pipelined parallel TDMA and the conventional version are performed on an Intel SIMD multi-core architecture. The results are compared in terms of elapsed time, speedup, cache miss rate. For a system of ‘n' linear equations where n = 2^36 in presented pipelined parallel TDMA achieves speedup of 1.294X with a parallel efficiency of 43% initially and inclines towards linear speed up as the system grows.