An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

Hao Ji, M. Sosonkina, Yaohang Li
{"title":"An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors","authors":"Hao Ji, M. Sosonkina, Yaohang Li","doi":"10.1109/Co-HPC.2014.10","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.","PeriodicalId":136638,"journal":{"name":"2014 Hardware-Software Co-Design for High Performance Computing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Hardware-Software Co-Design for High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Co-HPC.2014.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
块共轭梯度算法在CPU-GPU处理器上的实现
本文研究了块共轭梯度(BCG)算法在CPU-GPU处理器上的实现。通过分析BCG中各种矩阵运算的性能,找出了构造新搜索方向矩阵的主要性能瓶颈。用小矩阵的特征分解代替QR分解,减少了生成正交搜索方向的计算成本,从而解决了这个问题。此外,设计了一种混合(卸载)计算方案,使BCG实现能够处理GPU内存无法容纳的大型稀疏系数矩阵的线性系统。混合方案将矩阵操作卸载到GPU处理器,同时有助于隐藏CPU-GPU内存事务开销。我们将BCG实现的性能与使用自动卸载模式的Intel Xeon Phi协处理器的CPU性能进行了比较。在有足够数量的右手边的情况下,BCG的CPU-GPU实现可以达到2.61的加速,明显高于CPU-Intel Xeon Phi实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Abstract Machine Models and Proxy Architectures for Exascale Computing Design and Analysis of a 32-bit Embedded High-Performance Cluster Optimized for Energy and Performance mPPM, Viewed as a Co-Design Effort An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors Performance and Energy Evaluation of CoMD on Intel Xeon Phi Co-processors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1