Soft error resilient QR factorization for hybrid system with GPGPU

Peng Du, P. Luszczek, S. Tomov, J. Dongarra
{"title":"Soft error resilient QR factorization for hybrid system with GPGPU","authors":"Peng Du, P. Luszczek, S. Tomov, J. Dongarra","doi":"10.1145/2133173.2133179","DOIUrl":null,"url":null,"abstract":"The general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing due to their performance advantages over CPUs. As a result, fault tolerance has become a more serious concern compared to the period when GPGPUs were used exclusively for graphics applications. Using GPUs and CPUs together in a hybrid computing system increases flexibility and performance but also increases the possibility of the computations being affected by soft errors. In this work, we propose a soft error resilient algorithm for QR factorization on such hybrid systems. Our contributions include (1) a checkpointing and recovery mechanism for the left-factor Q whose performance is scalable on hybrid systems; (2) optimized Givens rotation utilities on GPGPUs to efficiently reduce an upper Hessenberg matrix to an upper triangular form for the protection of the right factor R, and (3) a recovery algorithm based on QR update on GPGPUs. Experimental results show that our fault tolerant QR factorization can success- fully detect and recover from soft errors in the entire matrix with little overhead on hybrid systems with GPGPUs.","PeriodicalId":259517,"journal":{"name":"ACM SIGPLAN Symposium on Scala","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGPLAN Symposium on Scala","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2133173.2133179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36

Abstract

The general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing due to their performance advantages over CPUs. As a result, fault tolerance has become a more serious concern compared to the period when GPGPUs were used exclusively for graphics applications. Using GPUs and CPUs together in a hybrid computing system increases flexibility and performance but also increases the possibility of the computations being affected by soft errors. In this work, we propose a soft error resilient algorithm for QR factorization on such hybrid systems. Our contributions include (1) a checkpointing and recovery mechanism for the left-factor Q whose performance is scalable on hybrid systems; (2) optimized Givens rotation utilities on GPGPUs to efficiently reduce an upper Hessenberg matrix to an upper triangular form for the protection of the right factor R, and (3) a recovery algorithm based on QR update on GPGPUs. Experimental results show that our fault tolerant QR factorization can success- fully detect and recover from soft errors in the entire matrix with little overhead on hybrid systems with GPGPUs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPGPU混合系统的软误差弹性QR分解
通用图形处理单元(GPGPU)由于其性能优于cpu,越来越多地用于科学计算。因此,与gpgpu专门用于图形应用程序的时期相比,容错性已经成为一个更严重的问题。在混合计算系统中同时使用gpu和cpu可以提高灵活性和性能,但也增加了计算受到软错误影响的可能性。在这项工作中,我们提出了一个软误差弹性算法QR分解在这种混合系统。我们的贡献包括:(1)左因子Q的检查点和恢复机制,其性能在混合系统上是可扩展的;(2)优化了gpgpu上的Givens旋转实用程序,有效地将上Hessenberg矩阵约简为上三角形形式,以保护右因子R; (3) gpgpu上基于QR更新的恢复算法。实验结果表明,本文提出的容错QR分解方法在具有gpgpu的混合系统中能够以很小的开销成功地检测和恢复整个矩阵的软错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A scalable randomized least squares solver for dense overdetermined systems A parallel ensemble Kalman filter implementation based on modified Cholesky decomposition Mixed-precision block gram Schmidt orthogonalization Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators On efficient Monte Carlo preconditioners and hybrid Monte Carlo methods for linear algebra
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1