Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2019-05-20 DOI:10.1109/IPDPS.2019.00057

I. Yamazaki, Z. Bai, Ding Lu, J. Dongarra

{"title":"Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation","authors":"I. Yamazaki, Z. Bai, Ding Lu, J. Dongarra","doi":"10.1109/IPDPS.2019.00057","DOIUrl":null,"url":null,"abstract":"Some scientific and engineering applications need to compute a large number of eigenpairs of a large Hermitian matrix. Though the Lanczos method is effective for computing a few eigenvalues, it can be expensive for computing a large number of eigenpairs (e.g., in terms of computation and communication). To improve the performance of the method, in this paper, we study an s-step variant of thick-restart Lanczos (TRLan) combined with an explicit external deflation (EED). The s-step method generates a set of s basis vectors at a time and reduces the communication costs of generating the basis vectors. We then design a specialized matrix powers kernel (MPK) that further reduces the communication and computational costs by taking advantage of the special properties of the deflation matrix. We conducted numerical experiments of the new TRLan eigensolver using synthetic matrices and matrices from electronic structure calculations. The performance results on the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC) demonstrate the potential of the specialized MPK to significantly reduce the execution time of the TRLan eigensolver. The speedups of up to 3.1× and 5.3× were obtained in our sequential and parallel runs, respectively.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Some scientific and engineering applications need to compute a large number of eigenpairs of a large Hermitian matrix. Though the Lanczos method is effective for computing a few eigenvalues, it can be expensive for computing a large number of eigenpairs (e.g., in terms of computation and communication). To improve the performance of the method, in this paper, we study an s-step variant of thick-restart Lanczos (TRLan) combined with an explicit external deflation (EED). The s-step method generates a set of s basis vectors at a time and reduces the communication costs of generating the basis vectors. We then design a specialized matrix powers kernel (MPK) that further reduces the communication and computational costs by taking advantage of the special properties of the deflation matrix. We conducted numerical experiments of the new TRLan eigensolver using synthetic matrices and matrices from electronic structure calculations. The performance results on the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC) demonstrate the potential of the specialized MPK to significantly reduce the execution time of the TRLan eigensolver. The speedups of up to 3.1× and 5.3× were obtained in our sequential and parallel runs, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有明确外部通货紧缩的重启动Lanczos的矩阵幂核

一些科学和工程应用需要计算一个大厄米矩阵的大量特征对。尽管Lanczos方法对于计算少数特征值是有效的，但是对于计算大量特征对(例如，在计算和通信方面)来说，它可能是昂贵的。为了提高该方法的性能，本文结合显式外放气(EED)，研究了厚重启Lanczos (TRLan)的s步变体。s步法每次生成一组s个基向量，减少了生成基向量的通信开销。然后，我们设计了一个专门的矩阵功率内核(MPK)，利用压缩矩阵的特殊性质进一步降低了通信和计算成本。我们利用合成矩阵和电子结构计算矩阵对新的TRLan特征求解器进行了数值实验。在国家能源研究科学计算中心(NERSC)的Cori超级计算机上的性能结果表明，专用MPK可以显著缩短TRLan特征解算器的执行时间。在我们的连续和并行运行中分别获得了高达3.1倍和5.3倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量