Distributed GPU Based Matrix Power Kernel for Geoscience Applications

A. Sedrakian, T. Guignon
{"title":"Distributed GPU Based Matrix Power Kernel for Geoscience Applications","authors":"A. Sedrakian, T. Guignon","doi":"10.2118/203947-ms","DOIUrl":null,"url":null,"abstract":"\n High-performance computing is at the heart of digital technology which allows to simulate complex physical phenomena. The current trend for hardware architectures is toward heterogeneous systems with multi-core CPUs accelerated by GPUs to get high computing power. The demand for fast solution of Geoscience simulations coupled with new computing architectures drives the need for challenging parallel algorithms. Such applications based on partial differential equations, requires to solve large and sparse linear system of equations. This work makes a step further in Matrix Powers Kernel (MPK) which is a crucial kernel in solving sparse linear systems using communication-avoiding methods. This class of methods deals with the degradation of performances observed beyond several nodes by decreasing the gap between the time necessary to perform the computations and the time needed to communicate the results. The proposed work consists of a new formulation for distributed MPK kernels for the cluster of GPUs where the pipeline communications could be overlapped by the computation. Also, appropriate data reorganization decreases the memory traffic between processors and accelerators and improves performance. The proposed structure is based on the separation of local and external components with different layers of interface nodes-due to the MPK algorithm-. The data is restructured in a way where all the data required by the neighbor process comes contiguously at the end, after the local one. Thanks to an assembly step, the contents of the messages for each neighbor are determined. Such data structure has a major impact on the efficiency of the solution, since it permits to design an appropriate communication scheme where the computation with local data can occur on the GPUs and the external ones on the CPUs. Moreover, it permits more efficient inter-process communication by an effective overlap of the communication by the computation in the asynchronous pipeline way. We validate our design through the test cases with different block matrices obtained from different reservoir simulations : fractured reservoir dual-medium, black-oil two phase-flow, and three phase-flow models. The experimental results demonstrate the performance of the proposed approach compared to state of the art. The proposed MPK running on several nodes of the GPU cluster provides a significant performance gain over equivalent Sparse Matrix Vector product (SpMV) which is already optimized and provides better scalability.","PeriodicalId":11146,"journal":{"name":"Day 1 Tue, October 26, 2021","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 1 Tue, October 26, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/203947-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

High-performance computing is at the heart of digital technology which allows to simulate complex physical phenomena. The current trend for hardware architectures is toward heterogeneous systems with multi-core CPUs accelerated by GPUs to get high computing power. The demand for fast solution of Geoscience simulations coupled with new computing architectures drives the need for challenging parallel algorithms. Such applications based on partial differential equations, requires to solve large and sparse linear system of equations. This work makes a step further in Matrix Powers Kernel (MPK) which is a crucial kernel in solving sparse linear systems using communication-avoiding methods. This class of methods deals with the degradation of performances observed beyond several nodes by decreasing the gap between the time necessary to perform the computations and the time needed to communicate the results. The proposed work consists of a new formulation for distributed MPK kernels for the cluster of GPUs where the pipeline communications could be overlapped by the computation. Also, appropriate data reorganization decreases the memory traffic between processors and accelerators and improves performance. The proposed structure is based on the separation of local and external components with different layers of interface nodes-due to the MPK algorithm-. The data is restructured in a way where all the data required by the neighbor process comes contiguously at the end, after the local one. Thanks to an assembly step, the contents of the messages for each neighbor are determined. Such data structure has a major impact on the efficiency of the solution, since it permits to design an appropriate communication scheme where the computation with local data can occur on the GPUs and the external ones on the CPUs. Moreover, it permits more efficient inter-process communication by an effective overlap of the communication by the computation in the asynchronous pipeline way. We validate our design through the test cases with different block matrices obtained from different reservoir simulations : fractured reservoir dual-medium, black-oil two phase-flow, and three phase-flow models. The experimental results demonstrate the performance of the proposed approach compared to state of the art. The proposed MPK running on several nodes of the GPU cluster provides a significant performance gain over equivalent Sparse Matrix Vector product (SpMV) which is already optimized and provides better scalability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于分布式GPU的地球科学应用矩阵功率内核
高性能计算是数字技术的核心,它允许模拟复杂的物理现象。当前硬件体系结构的发展趋势是采用gpu加速的多核cpu来实现异构系统,以获得更高的计算能力。对地球科学模拟快速解决方案的需求,加上新的计算架构,推动了对具有挑战性的并行算法的需求。此类应用基于偏微分方程,需要求解大型且稀疏的线性方程组。本文在矩阵幂函数核(MPK)这一利用通信避免方法求解稀疏线性系统的关键核上作了进一步的研究。这类方法通过减少执行计算所需的时间和通信结果所需的时间之间的差距来处理在多个节点之外观察到的性能下降。本文提出了一种用于gpu集群的分布式MPK内核的新公式,其中管道通信可以通过计算重叠。此外,适当的数据重组可以减少处理器和加速器之间的内存流量并提高性能。所提出的结构基于使用不同层的接口节点分离本地和外部组件(由于MPK算法)。数据重构的方式是,相邻进程所需的所有数据都连续出现在末尾,位于本地进程之后。由于有一个组装步骤,因此确定了每个邻居的消息内容。这样的数据结构对解决方案的效率有很大的影响,因为它允许设计一个适当的通信方案,其中本地数据的计算可以在gpu上进行,而外部数据可以在cpu上进行。此外,它通过异步管道计算方式有效地重叠通信,从而允许更高效的进程间通信。我们通过不同区块矩阵的测试案例验证了我们的设计,这些区块矩阵来自不同的油藏模拟:裂缝性油藏双介质、黑油两相流和三相流模型。实验结果表明,与现有的方法相比,该方法具有良好的性能。所提出的MPK在GPU集群的多个节点上运行,与已经优化的等效稀疏矩阵向量积(SpMV)相比,提供了显着的性能增益,并提供了更好的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Urban landscape and technical systems in the river valleys of the Right Bank of Ukraine Assessment of general and professional competences formation level by bachelors studying of specialty 101 Ecology Relief and geological structure of Vyzhnytskyi and Cheremoskyi national natural parks (Ukrainian Carpathians) Simulation of large-scale forest fire parameters Comparative characteristics of the land use structure for different types of territorial communities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1