在 GPU 上并行化粒子展示的算法

IF 4.6 2区 物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY SciPost Physics Pub Date : 2024-08-12 DOI:10.21468/scipostphyscodeb.33
Michael H. Seymour, Siddharth Sule
{"title":"在 GPU 上并行化粒子展示的算法","authors":"Michael H. Seymour, Siddharth Sule","doi":"10.21468/scipostphyscodeb.33","DOIUrl":null,"url":null,"abstract":"The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.","PeriodicalId":21682,"journal":{"name":"SciPost Physics","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An algorithm to parallelise parton showers on a GPU\",\"authors\":\"Michael H. Seymour, Siddharth Sule\",\"doi\":\"10.21468/scipostphyscodeb.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.\",\"PeriodicalId\":21682,\"journal\":{\"name\":\"SciPost Physics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SciPost Physics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.21468/scipostphyscodeb.33\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SciPost Physics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.21468/scipostphyscodeb.33","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

根据定义,GPU 编程的单指令多线程(SIMT)范式不支持粒子淋浴算法的分支性质。然而,现代 GPU 的设计可以独立调度具有不同进程的线程,从而允许它们处理此类分支。通过定期的线程同步和对各个步骤的仔细处理,我们就能在 GPU 上模拟出粒子雨。我们提出了一种苏达科夫否决算法,旨在并行模拟多个事件上的粒子分支。我们还发布了一个CUDA C++程序,它可以在GPU上生成矩阵元素、显示粒子并计算91.2 GeV的LEP的射流率和事件形状。为了对其性能进行基准测试,我们还提供了一个近乎相同的 C++ 程序,用于在 CPU 上串行模拟事件。虽然分支的后果并非不存在,但我们证明 GPU 可以提供多核 CPU 的吞吐量。例如,我们展示了在一个英伟达 TESLA V100 GPU 上对 10^6$ 事件进行喷淋所需的时间相当于 295 个英特尔至强 E5-2620 CPU 内核。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An algorithm to parallelise parton showers on a GPU
The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
SciPost Physics
SciPost Physics Physics and Astronomy-Physics and Astronomy (all)
CiteScore
8.20
自引率
12.70%
发文量
315
审稿时长
10 weeks
期刊介绍: SciPost Physics publishes breakthrough research articles in the whole field of Physics, covering Experimental, Theoretical and Computational approaches. Specialties covered by this Journal: - Atomic, Molecular and Optical Physics - Experiment - Atomic, Molecular and Optical Physics - Theory - Biophysics - Condensed Matter Physics - Experiment - Condensed Matter Physics - Theory - Condensed Matter Physics - Computational - Fluid Dynamics - Gravitation, Cosmology and Astroparticle Physics - High-Energy Physics - Experiment - High-Energy Physics - Theory - High-Energy Physics - Phenomenology - Mathematical Physics - Nuclear Physics - Experiment - Nuclear Physics - Theory - Quantum Physics - Statistical and Soft Matter Physics.
期刊最新文献
Two infinite families of facets of the holographic entropy cone Higher-form symmetry and chiral transport in real-time Abelian lattice gauge theory Flux-tunable Kitaev chain in a quantum dot array General quantum-classical dynamics as measurement based feedback Riemannian optimization of photonic quantum circuits in phase and Fock space
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1