Accelerating Lattice QCD Simulations using GPUs

Tilmann Matthaei
{"title":"Accelerating Lattice QCD Simulations using GPUs","authors":"Tilmann Matthaei","doi":"arxiv-2407.00041","DOIUrl":null,"url":null,"abstract":"Solving discretized versions of the Dirac equation represents a large share\nof execution time in lattice Quantum Chromodynamics (QCD) simulations. Many\nhigh-performance computing (HPC) clusters use graphics processing units (GPUs)\nto offer more computational resources. Our solver program, DDalphaAMG,\npreviously was unable to fully take advantage of GPUs to accelerate its\ncomputations. Making use of GPUs for DDalphaAMG is an ongoing development, and\nwe will present some current progress herein. Through a detailed description of\nour development, this thesis should offer valuable insights into using GPUs to\naccelerate a memory-bound CPU implementation. We developed a storage scheme for multiple tuples, which allows much more\nefficient memory access on GPUs, given that the element at the same index is\nread from multiple tuples simultaneously. Still, our implementation of a\ndiscrete Dirac operator is memory-bound, and we only achieved improvements for\nlarge linear systems on few nodes at the JUWELS cluster. These improvements do\nnot currently overcome additional introduced overheads. However, the results\nfor the application of the Wilson-Dirac operator show a speedup of around 3 for\nlarge lattices. If the additional overheads can be eliminated in the future,\nGPUs could reduce the DDalphaAMG execution time significantly for large\nlattices. We also found that a previous publication on the GPU acceleration of\nDDalphaAMG, underrepresented the achieved speedup, because small lattices were\nused. This further highlights that GPUs often require large-scale problems to\nsolve in order to be faster than CPUs","PeriodicalId":501191,"journal":{"name":"arXiv - PHYS - High Energy Physics - Lattice","volume":"133 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - High Energy Physics - Lattice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Solving discretized versions of the Dirac equation represents a large share of execution time in lattice Quantum Chromodynamics (QCD) simulations. Many high-performance computing (HPC) clusters use graphics processing units (GPUs) to offer more computational resources. Our solver program, DDalphaAMG, previously was unable to fully take advantage of GPUs to accelerate its computations. Making use of GPUs for DDalphaAMG is an ongoing development, and we will present some current progress herein. Through a detailed description of our development, this thesis should offer valuable insights into using GPUs to accelerate a memory-bound CPU implementation. We developed a storage scheme for multiple tuples, which allows much more efficient memory access on GPUs, given that the element at the same index is read from multiple tuples simultaneously. Still, our implementation of a discrete Dirac operator is memory-bound, and we only achieved improvements for large linear systems on few nodes at the JUWELS cluster. These improvements do not currently overcome additional introduced overheads. However, the results for the application of the Wilson-Dirac operator show a speedup of around 3 for large lattices. If the additional overheads can be eliminated in the future, GPUs could reduce the DDalphaAMG execution time significantly for large lattices. We also found that a previous publication on the GPU acceleration of DDalphaAMG, underrepresented the achieved speedup, because small lattices were used. This further highlights that GPUs often require large-scale problems to solve in order to be faster than CPUs
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 GPU 加速 Lattice QCD 模拟
在晶格量子色动力学(QCD)模拟中,求解离散化版本的狄拉克方程占用了大量的执行时间。许多高性能计算(HPC)集群使用图形处理器(GPU)来提供更多的计算资源。我们的求解程序 DDalphaAMG 以前无法充分利用 GPU 来加速计算。在DDalphaAMG中使用GPU是一项持续的开发工作,我们将在此介绍目前的一些进展。通过详细描述我们的开发过程,本论文将为使用 GPU 加速受内存限制的 CPU 实现提供有价值的见解。我们开发了一种多元组存储方案,考虑到同一索引中的元素会同时从多个元组中读取,该方案可以在 GPU 上实现更高效的内存访问。尽管如此,我们对离散狄拉克算子的实现仍然受到内存限制,而且我们只在 JUWELS 集群的少数节点上对大型线性系统进行了改进。这些改进目前还无法克服额外引入的开销。不过,应用威尔逊-狄拉克算子的结果显示,对于大型网格,速度提高了约 3 倍。如果将来能消除额外的开销,GPU 就能显著缩短大网格的 DDalphaAMG 执行时间。我们还发现,之前发表的一篇关于GPU加速DDalphaAMG的文章,由于使用的是小网格,所以没有充分反映所实现的加速。这进一步凸显了GPU通常需要解决大规模问题才能比CPU快
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The $η_c$-meson leading-twist distribution amplitude Bootstrap-determined p-values in Lattice QCD Inverse Spin Hall Effect in Nonequilibrium Dirac Systems Induced by Anomalous Flow Imbalance Supersymmetric QCD on the lattice: Fine-tuning and counterterms for the quartic couplings Finite-size topological phases from semimetals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1