Untangling Modern Parallel Programming Models

International Workshop on OpenCL Pub Date : 2022-05-10 DOI:10.1145/3529538.3529987

M. Kinsner, Ben Ashbaugh, James C. Brodman, G. Lueck, S. Pennycook, Roland Schulz

{"title":"Untangling Modern Parallel Programming Models","authors":"M. Kinsner, Ben Ashbaugh, James C. Brodman, G. Lueck, S. Pennycook, Roland Schulz","doi":"10.1145/3529538.3529987","DOIUrl":null,"url":null,"abstract":"Modern hardware is increasingly rich in diversity, including CPUs, GPUs, FPGAs and more, with new and novel architectures constantly emerging. To provide differentiation between these devices, each is typically built around architectures optimized for some classes of application or some patterns of parallelism. Numerous computational cores, varying levels of hardware vectorization, and other degrees of architectural freedom exist across the many hardware options. The need to efficiently utilize diverse hardware has led to emergence of a wide variety of programming models, execution models, and languages, and has simultaneously led to a complex landscape of confused and often conflicting terminology and abstractions. This reality makes it challenging for developers to comprehend and then choose a programming model that fits with their applications and mental model, particularly when more than one target architecture or vendor is of interest. This talk strives to untangle the landscape of modern parallel programming models, to help developers understand how the models and options relate to each other, and to frame how to think about their specific algorithms when expressing them in code. Although experienced developers typically understand much of the terminology and the relationships between models, a holistic presentation of the material is of strong value, as evidenced by feedback from parallel programming experts that have seen previews of this presentation. To begin, a brief overview will be presented to frame parallel programming and offload compute programming models, followed by characterization of the Single Program Multiple Data (SPMD) abstract model and the power it exhibits when mapping to multiple classes of architecture. We will discuss how fundamental design decisions within a compiler impact the mapping from source code to an underlying programming model, highlighting that the same code can be lowered to multiple models. This is particularly relevant in the presence of vector data types, which permit multiple interpretations and are a common cause of confusion. A core element of the presentation is decomposition of how programming model and design assumptions of a compiler are ideally understood concurrently by developers to streamline the creation and tuning of performant code. SPMD and explicit Single Instruction Multiple Data (SIMD) programming models will be discussed relative to the Khronos OpenCL and SYCL standards, as well as to OpenMP and CUDA, with the aim of clarifying the concepts and models for developers working in specific languages. The talk will conclude with an overview of an experimental extension to SYCL that proposes a mechanism for mixing SPMD and explicit SIMD programming styles with clear semantics and boundaries in code. The talk will show that providing clear points of transition with clear semantics can enable expert tuning at the granularity of a single line of code, without breaking the SPMD programming abstraction used by the rest of a kernel. Parallel programming models such as SPMD and SIMD are critical in the modern landscape of heterogeneous compute architectures. When coupled with decisions made during the implementation of specific compilers, developers are left with a complex task when working to understand how concepts and hardware mappings interact. This talk describes the most common programming models exposed through SYCL, OpenCL, OpenMP, and CUDA, with the intent of clarifying misconceptions and confusion about the mapping of software to hardware. Attendees will leave the presentation with a holistic understanding of how SPMD and SIMD-like programming models fit together, and how they relate to the code that many of us write from day to day.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern hardware is increasingly rich in diversity, including CPUs, GPUs, FPGAs and more, with new and novel architectures constantly emerging. To provide differentiation between these devices, each is typically built around architectures optimized for some classes of application or some patterns of parallelism. Numerous computational cores, varying levels of hardware vectorization, and other degrees of architectural freedom exist across the many hardware options. The need to efficiently utilize diverse hardware has led to emergence of a wide variety of programming models, execution models, and languages, and has simultaneously led to a complex landscape of confused and often conflicting terminology and abstractions. This reality makes it challenging for developers to comprehend and then choose a programming model that fits with their applications and mental model, particularly when more than one target architecture or vendor is of interest. This talk strives to untangle the landscape of modern parallel programming models, to help developers understand how the models and options relate to each other, and to frame how to think about their specific algorithms when expressing them in code. Although experienced developers typically understand much of the terminology and the relationships between models, a holistic presentation of the material is of strong value, as evidenced by feedback from parallel programming experts that have seen previews of this presentation. To begin, a brief overview will be presented to frame parallel programming and offload compute programming models, followed by characterization of the Single Program Multiple Data (SPMD) abstract model and the power it exhibits when mapping to multiple classes of architecture. We will discuss how fundamental design decisions within a compiler impact the mapping from source code to an underlying programming model, highlighting that the same code can be lowered to multiple models. This is particularly relevant in the presence of vector data types, which permit multiple interpretations and are a common cause of confusion. A core element of the presentation is decomposition of how programming model and design assumptions of a compiler are ideally understood concurrently by developers to streamline the creation and tuning of performant code. SPMD and explicit Single Instruction Multiple Data (SIMD) programming models will be discussed relative to the Khronos OpenCL and SYCL standards, as well as to OpenMP and CUDA, with the aim of clarifying the concepts and models for developers working in specific languages. The talk will conclude with an overview of an experimental extension to SYCL that proposes a mechanism for mixing SPMD and explicit SIMD programming styles with clear semantics and boundaries in code. The talk will show that providing clear points of transition with clear semantics can enable expert tuning at the granularity of a single line of code, without breaking the SPMD programming abstraction used by the rest of a kernel. Parallel programming models such as SPMD and SIMD are critical in the modern landscape of heterogeneous compute architectures. When coupled with decisions made during the implementation of specific compilers, developers are left with a complex task when working to understand how concepts and hardware mappings interact. This talk describes the most common programming models exposed through SYCL, OpenCL, OpenMP, and CUDA, with the intent of clarifying misconceptions and confusion about the mapping of software to hardware. Attendees will leave the presentation with a holistic understanding of how SPMD and SIMD-like programming models fit together, and how they relate to the code that many of us write from day to day.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

解开现代并行编程模型

现代硬件的多样性越来越丰富，包括cpu, gpu, fpga等，新的和新颖的架构不断涌现。为了区分这些设备，每个设备通常都是围绕针对某些应用程序类或某些并行模式进行优化的体系结构构建的。在许多硬件选项中存在大量的计算核心、不同级别的硬件矢量化和其他程度的架构自由度。由于需要有效地利用各种硬件，因此出现了各种各样的编程模型、执行模型和语言，同时也导致了术语和抽象的混乱和经常冲突的复杂局面。这一现实使得开发人员很难理解并选择适合其应用程序和心智模型的编程模型，特别是当有多个目标体系结构或供应商感兴趣时。本次演讲致力于理清现代并行编程模型的格局，帮助开发人员理解模型和选项之间的关系，以及在代码中表达它们时如何思考它们的特定算法。尽管有经验的开发人员通常理解许多术语和模型之间的关系，但是材料的整体表示还是很有价值的，正如已经看过该表示预览版的并行编程专家的反馈所证明的那样。首先，将简要概述框架并行编程和卸载计算编程模型，然后描述单程序多数据(SPMD)抽象模型的特征，以及它在映射到多个体系结构类时所显示的功能。我们将讨论编译器中的基本设计决策如何影响从源代码到底层编程模型的映射，并强调相同的代码可以降低到多个模型。这在存在矢量数据类型时尤其重要，因为它允许多种解释，并且是引起混淆的常见原因。该演示的一个核心元素是分解开发人员如何理想地同时理解编译器的编程模型和设计假设，以简化高性能代码的创建和调优。SPMD和显式单指令多数据(SIMD)编程模型将相对于Khronos OpenCL和SYCL标准，以及OpenMP和CUDA进行讨论，目的是为使用特定语言的开发人员澄清概念和模型。讲座的最后将概述SYCL的一个实验性扩展，该扩展提出了一种机制，用于混合SPMD和显式SIMD编程风格，并在代码中具有清晰的语义和边界。该演讲将表明，提供具有清晰语义的清晰转换点可以使专家在单行代码的粒度上进行调优，而不会破坏内核其余部分使用的SPMD编程抽象。并行编程模型(如SPMD和SIMD)在现代异构计算体系结构中非常重要。再加上在特定编译器的实现过程中做出的决策，开发人员在理解概念和硬件映射如何交互时面临着一项复杂的任务。本演讲描述了通过SYCL, OpenCL, OpenMP和CUDA暴露的最常见的编程模型，旨在澄清关于软件到硬件映射的误解和困惑。与会者将全面了解SPMD和类似simd的编程模型是如何结合在一起的，以及它们如何与我们许多人每天编写的代码相关联。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Workshop on OpenCL

自引率

0.00%

发文量

期刊最新文献

Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL Acceleration of Quantum Transport Simulations with OpenCL CodePin: An Instrumentation-Based Debug Tool of SYCLomatic An Efficient Approach to Resolving Stack Overflow of SYCL Kernel on Intel® CPUs Ray Tracer based lidar simulation using SYCL