在 GPU 上并行化粒子展示的算法

IF 5.4 2区物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY SciPost Physics Pub Date : 2024-08-12 DOI:10.21468/scipostphyscodeb.33

Michael H. Seymour, Siddharth Sule

{"title":"在 GPU 上并行化粒子展示的算法","authors":"Michael H. Seymour, Siddharth Sule","doi":"10.21468/scipostphyscodeb.33","DOIUrl":null,"url":null,"abstract":"The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.","PeriodicalId":21682,"journal":{"name":"SciPost Physics","volume":"66 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An algorithm to parallelise parton showers on a GPU\",\"authors\":\"Michael H. Seymour, Siddharth Sule\",\"doi\":\"10.21468/scipostphyscodeb.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.\",\"PeriodicalId\":21682,\"journal\":{\"name\":\"SciPost Physics\",\"volume\":\"66 1\",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SciPost Physics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.21468/scipostphyscodeb.33\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SciPost Physics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.21468/scipostphyscodeb.33","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

根据定义，GPU 编程的单指令多线程（SIMT）范式不支持粒子淋浴算法的分支性质。然而，现代 GPU 的设计可以独立调度具有不同进程的线程，从而允许它们处理此类分支。通过定期的线程同步和对各个步骤的仔细处理，我们就能在 GPU 上模拟出粒子雨。我们提出了一种苏达科夫否决算法，旨在并行模拟多个事件上的粒子分支。我们还发布了一个CUDA C++程序，它可以在GPU上生成矩阵元素、显示粒子并计算91.2 GeV的LEP的射流率和事件形状。为了对其性能进行基准测试，我们还提供了一个近乎相同的 C++ 程序，用于在 CPU 上串行模拟事件。虽然分支的后果并非不存在，但我们证明 GPU 可以提供多核 CPU 的吞吐量。例如，我们展示了在一个英伟达 TESLA V100 GPU 上对 10^6$ 事件进行喷淋所需的时间相当于 295 个英特尔至强 E5-2620 CPU 内核。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An algorithm to parallelise parton showers on a GPU

The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SciPost Physics Physics and Astronomy-Physics and Astronomy (all)

CiteScore

8.20

自引率

12.70%

发文量

315

审稿时长

10 weeks

期刊介绍： SciPost Physics publishes breakthrough research articles in the whole field of Physics, covering Experimental, Theoretical and Computational approaches. Specialties covered by this Journal: - Atomic, Molecular and Optical Physics - Experiment - Atomic, Molecular and Optical Physics - Theory - Biophysics - Condensed Matter Physics - Experiment - Condensed Matter Physics - Theory - Condensed Matter Physics - Computational - Fluid Dynamics - Gravitation, Cosmology and Astroparticle Physics - High-Energy Physics - Experiment - High-Energy Physics - Theory - High-Energy Physics - Phenomenology - Mathematical Physics - Nuclear Physics - Experiment - Nuclear Physics - Theory - Quantum Physics - Statistical and Soft Matter Physics.

期刊最新文献

Two infinite families of facets of the holographic entropy cone Higher-form symmetry and chiral transport in real-time Abelian lattice gauge theory Flux-tunable Kitaev chain in a quantum dot array General quantum-classical dynamics as measurement based feedback Riemannian optimization of photonic quantum circuits in phase and Fock space