Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSol

Ravil Dorozhinskii, G. B. Gadeschi, Michael Bader
{"title":"Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSol","authors":"Ravil Dorozhinskii, G. B. Gadeschi, Michael Bader","doi":"10.1002/cpe.8037","DOIUrl":null,"url":null,"abstract":"This study shows how GPU performance of the ADER discontinuous Galerkin method in SeisSol (an earthquake simulation software) can be further improved while preserving its original design that ensures high CPU performance. We introduce a new code generator (“ChainForge”) that fuses subsequent batched matrix multiplications (“GEMMs”) into a single GPU kernel, holding intermediate results in shared memory as long as necessary. The generator operates as an external module linked against SeisSol's domain specific language YATeTo and, as a result, the original SeisSol source code remains mainly unchanged. In this paper, we discuss several challenges related to automatic fusion of GPU kernels and provide solutions to them. By and large, we gain 60% in performance of SeisSol's wave propagation solver using Fused‐GEMMs compared to the original GPU implementation. We demonstrated this on benchmarks as well as on a real production scenario simulating the Northridge 1994 earthquake.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"39 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/cpe.8037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This study shows how GPU performance of the ADER discontinuous Galerkin method in SeisSol (an earthquake simulation software) can be further improved while preserving its original design that ensures high CPU performance. We introduce a new code generator (“ChainForge”) that fuses subsequent batched matrix multiplications (“GEMMs”) into a single GPU kernel, holding intermediate results in shared memory as long as necessary. The generator operates as an external module linked against SeisSol's domain specific language YATeTo and, as a result, the original SeisSol source code remains mainly unchanged. In this paper, we discuss several challenges related to automatic fusion of GPU kernels and provide solutions to them. By and large, we gain 60% in performance of SeisSol's wave propagation solver using Fused‐GEMMs compared to the original GPU implementation. We demonstrated this on benchmarks as well as on a real production scenario simulating the Northridge 1994 earthquake.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在 SeisSol 中融合 GEMMs 以实现 ADER-DG 方法的高效 GPU 实施
本研究展示了如何进一步提高 SeisSol(一款地震模拟软件)中 ADER 非连续伽勒金方法的 GPU 性能,同时保留其确保 CPU 高性能的原始设计。我们引入了一种新的代码生成器("ChainForge"),可将后续的分批矩阵乘法("GEMM")融合到一个 GPU 内核中,必要时将中间结果保留在共享内存中。生成器作为外部模块与 SeisSol 的特定领域语言 YATeTo 相链接,因此,SeisSol 的原始源代码基本保持不变。在本文中,我们讨论了与 GPU 内核自动融合相关的几个挑战,并提供了解决方案。总的来说,与最初的GPU实现相比,使用Fused-GEMMs的SeisSol波传播求解器的性能提高了60%。我们在基准测试以及模拟 1994 年北岭地震的实际生产场景中证明了这一点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Time‐based DDoS attack detection through hybrid LSTM‐CNN model architectures: An investigation of many‐to‐one and many‐to‐many approaches Distributed low‐latency broadcast scheduling for multi‐channel duty‐cycled wireless IoT networks Open‐domain event schema induction via weighted attentive hypergraph neural network Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSol Simulation method for infrared radiation transmission characteristics of typical ship targets based on optical remote sensing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1