Adaptive Optimization Modeling of Preconditioned Conjugate Gradient on Multi-GPUs

IF 1.2 Q3 COMPUTER SCIENCE, THEORY & METHODS ACM Transactions on Parallel Computing Pub Date : 2016-12-26 DOI:10.1145/2990849

Jiaquan Gao, Yu Wang, Jun Wang, Ronghua Liang

{"title":"Adaptive Optimization Modeling of Preconditioned Conjugate Gradient on Multi-GPUs","authors":"Jiaquan Gao, Yu Wang, Jun Wang, Ronghua Liang","doi":"10.1145/2990849","DOIUrl":null,"url":null,"abstract":"The preconditioned conjugate gradient (PCG) algorithm is a well-known iterative method for solving sparse linear systems in scientific computations. GPU-accelerated PCG algorithms for large-sized problems have attracted considerable attention recently. However, on a specific multi-GPU platform, producing a highly parallel PCG implementation for any large-sized problem requires significant time because several manual steps are involved in adjusting the related parameters and selecting an appropriate storage format for the matrix block that is assigned to each GPU. This motivates us to propose adaptive optimization modeling of PCG on multi-GPUs, which mainly involves the following parts: (1) an optimization multi-GPU parallel framework of PCG and (2) the profile-based optimization modeling for each one of the main components of the PCG algorithm, including vector operation, inner product, and sparse matrix-vector multiplication (SpMV). Our model does not construct a new storage format or kernel but automatically and rapidly generates an optimal parallel PCG algorithm for any problem on a specific multi-GPU platform by integrating existing storage formats and kernels. We take a vector operation kernel, an inner-product kernel, and five popular SpMV kernels for an example to present the idea of constructing the model. Given that our model is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"26 1","pages":"16:1-16:33"},"PeriodicalIF":1.2000,"publicationDate":"2016-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Parallel Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2990849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 5

Abstract

The preconditioned conjugate gradient (PCG) algorithm is a well-known iterative method for solving sparse linear systems in scientific computations. GPU-accelerated PCG algorithms for large-sized problems have attracted considerable attention recently. However, on a specific multi-GPU platform, producing a highly parallel PCG implementation for any large-sized problem requires significant time because several manual steps are involved in adjusting the related parameters and selecting an appropriate storage format for the matrix block that is assigned to each GPU. This motivates us to propose adaptive optimization modeling of PCG on multi-GPUs, which mainly involves the following parts: (1) an optimization multi-GPU parallel framework of PCG and (2) the profile-based optimization modeling for each one of the main components of the PCG algorithm, including vector operation, inner product, and sparse matrix-vector multiplication (SpMV). Our model does not construct a new storage format or kernel but automatically and rapidly generates an optimal parallel PCG algorithm for any problem on a specific multi-GPU platform by integrating existing storage formats and kernels. We take a vector operation kernel, an inner-product kernel, and five popular SpMV kernels for an example to present the idea of constructing the model. Given that our model is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多gpu预条件共轭梯度自适应优化建模

预条件共轭梯度(PCG)算法是科学计算中求解稀疏线性系统的一种众所周知的迭代方法。gpu加速的大规模问题PCG算法近年来引起了人们的广泛关注。然而，在特定的多GPU平台上，为任何大型问题生成高度并行的PCG实现需要大量时间，因为涉及到调整相关参数和为分配给每个GPU的矩阵块选择适当的存储格式的几个手动步骤。这促使我们提出PCG在多gpu上的自适应优化建模，主要包括以下几个部分:(1)PCG的优化多gpu并行框架;(2)PCG算法各主要组成部分的基于轮廓的优化建模，包括向量运算、内积和稀疏矩阵向量乘法(SpMV)。我们的模型不构建新的存储格式或内核，而是通过集成现有的存储格式和内核，自动快速地为特定多gpu平台上的任何问题生成最优并行PCG算法。我们以一个向量运算核、一个内积核和五个流行的SpMV核为例来介绍构造模型的思想。考虑到我们的模型是通用的，独立于问题，并且依赖于设备的资源，这个模型只针对每种类型的GPU构建一次。实验验证了该模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊