Improved address space inference for SYCL programs

Ross Brunton, V. Lomüller
{"title":"Improved address space inference for SYCL programs","authors":"Ross Brunton, V. Lomüller","doi":"10.1145/3529538.3529998","DOIUrl":null,"url":null,"abstract":"SYCL[4, 6] is a single source C++ based programming model for heterogeneous programming. It enables the programmer to write or port code targeting heterogeneous accelerators using what appears to the programmer as standard C++. To achieve peak performance, however, it can be necessary to write the code in a form which allows the compiler to target specific hardware features. If the compiler can target these hardware features without requiring the programmer to consider them, then productivity and application performance can both be improved. One such example is accelerators with multiple address spaces, this technical talk will describe how a SYCL compiler can infer these address spaces without requiring the programmer to specify them in their application as well as describe some required specification evolution in order to better cope with the new SYCL 2020 features. Hardware devices can have multiple memory regions with different levels of visibility and performance. Similar to OpenCL C[5], SYCL abstracts them into a global memory visible to all work-items, a local memory visible to a single work-group, or a private memory only visible to a single work-item. In OpenCL C, the programmer expresses address spaces using type qualifiers in order to statically encode the memory region addressed by pointers thus ensuring that when a programmer does specify an address space the compiler can check whether the program is well-formed. But requiring programs to be written with explicit address spaces comes at the expense of usability, as these need to be integrated into the program design and are a barrier to integrate code not written with this in mind. Thus in OpenCL C 2.x/3 programmers can make use of the unnamed generic address space instead. On the other hand, SYCL does not extend the C++ language therefore programmers cannot express address spaces using a type qualifier (as the C++ standard does not define them). Thus in SYCL pointers and references can be lowered to this unnamed generic address space by the device compiler. This generic address space is a virtual address space that can represent several overlapping address spaces at the same time. The memory being addressed is no longer statically known by the compiler frontend and the SYCL implementation relies on the hardware, or software emulation, to correctly dispatch the loads and stores to the correct memory. On some hardware targets this flexibility comes with a performance cost, but this can be avoided when the compiler can infer a single address space for a given memory access. Additionally, the low-level compute APIs that are often used as backends to a SYCL 2020 implementation do not guarantee support for a generic address space, e.g. they are an optional feature in OpenCL 3.0 and non-existent in Vulkan. This means that a SYCL compiler that can infer all address spaces for a large set of programs can achieve better performance and target a wider range of backend compute APIs. Moreover, recent efforts to bring safety critical development to SYCL means it will also need to run on top of Vulkan SC. This makes the ability to have a well-defined specification for inferring address spaces still relevant for SYCL. The rules introduced by SYCL 1.2.1 impose significant restrictions on user code. One striking example is the ”defaulting rule”: when a pointer declaration has no initializer, the pointer is assumed to address the private memory, even if it is initialized in the very next statement. As a consequence, you cannot declare a pointer in a structure without it defaulting to the private address space. In practice, however, these restrictions are not a significant barrier in the context of 1.2.1 and large applications were ported to run with SYCL such as Eigen[3] or build new ones like SYCL-BLAS[1] or SYCL-DNN[2]. SYCL 2020 brought significant changes and added flexibility to users. Among them are the unnamed generic address space and unified shared memory (USM) pointers. The generic address space allowed to lift the restrictions stated by 1.2.1, making programs written for 2020 and generic unlikely to be compilable under the inference rules restriction. USM encourages the usage of raw pointers instead of the accessors container as this quickly implies passing these pointers via structures. As a USM pointer is in fact addressing the global memory region, this creates a conflict with inference rules. This talk will describe an experimental compiler for ComputeCpp, Codeplay’s SYCL implementation. This compiler employs an improved address space inference method that can efficiently cope with SYCL 2020 features such as the generic address space and unified shared memory (USM) pointers. The talk with also cover the limitations of this approach.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

SYCL[4, 6] is a single source C++ based programming model for heterogeneous programming. It enables the programmer to write or port code targeting heterogeneous accelerators using what appears to the programmer as standard C++. To achieve peak performance, however, it can be necessary to write the code in a form which allows the compiler to target specific hardware features. If the compiler can target these hardware features without requiring the programmer to consider them, then productivity and application performance can both be improved. One such example is accelerators with multiple address spaces, this technical talk will describe how a SYCL compiler can infer these address spaces without requiring the programmer to specify them in their application as well as describe some required specification evolution in order to better cope with the new SYCL 2020 features. Hardware devices can have multiple memory regions with different levels of visibility and performance. Similar to OpenCL C[5], SYCL abstracts them into a global memory visible to all work-items, a local memory visible to a single work-group, or a private memory only visible to a single work-item. In OpenCL C, the programmer expresses address spaces using type qualifiers in order to statically encode the memory region addressed by pointers thus ensuring that when a programmer does specify an address space the compiler can check whether the program is well-formed. But requiring programs to be written with explicit address spaces comes at the expense of usability, as these need to be integrated into the program design and are a barrier to integrate code not written with this in mind. Thus in OpenCL C 2.x/3 programmers can make use of the unnamed generic address space instead. On the other hand, SYCL does not extend the C++ language therefore programmers cannot express address spaces using a type qualifier (as the C++ standard does not define them). Thus in SYCL pointers and references can be lowered to this unnamed generic address space by the device compiler. This generic address space is a virtual address space that can represent several overlapping address spaces at the same time. The memory being addressed is no longer statically known by the compiler frontend and the SYCL implementation relies on the hardware, or software emulation, to correctly dispatch the loads and stores to the correct memory. On some hardware targets this flexibility comes with a performance cost, but this can be avoided when the compiler can infer a single address space for a given memory access. Additionally, the low-level compute APIs that are often used as backends to a SYCL 2020 implementation do not guarantee support for a generic address space, e.g. they are an optional feature in OpenCL 3.0 and non-existent in Vulkan. This means that a SYCL compiler that can infer all address spaces for a large set of programs can achieve better performance and target a wider range of backend compute APIs. Moreover, recent efforts to bring safety critical development to SYCL means it will also need to run on top of Vulkan SC. This makes the ability to have a well-defined specification for inferring address spaces still relevant for SYCL. The rules introduced by SYCL 1.2.1 impose significant restrictions on user code. One striking example is the ”defaulting rule”: when a pointer declaration has no initializer, the pointer is assumed to address the private memory, even if it is initialized in the very next statement. As a consequence, you cannot declare a pointer in a structure without it defaulting to the private address space. In practice, however, these restrictions are not a significant barrier in the context of 1.2.1 and large applications were ported to run with SYCL such as Eigen[3] or build new ones like SYCL-BLAS[1] or SYCL-DNN[2]. SYCL 2020 brought significant changes and added flexibility to users. Among them are the unnamed generic address space and unified shared memory (USM) pointers. The generic address space allowed to lift the restrictions stated by 1.2.1, making programs written for 2020 and generic unlikely to be compilable under the inference rules restriction. USM encourages the usage of raw pointers instead of the accessors container as this quickly implies passing these pointers via structures. As a USM pointer is in fact addressing the global memory region, this creates a conflict with inference rules. This talk will describe an experimental compiler for ComputeCpp, Codeplay’s SYCL implementation. This compiler employs an improved address space inference method that can efficiently cope with SYCL 2020 features such as the generic address space and unified shared memory (USM) pointers. The talk with also cover the limitations of this approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
改进了SYCL程序的地址空间推断
SYCL[4,6]是用于异构编程的基于c++的单源编程模型。它使程序员能够使用标准c++编写或移植针对异构加速器的代码。然而,为了达到最佳性能,可能有必要以一种允许编译器针对特定硬件特性的形式编写代码。如果编译器可以针对这些硬件特性而不需要程序员考虑它们,那么生产力和应用程序性能都可以得到提高。一个这样的例子是具有多个地址空间的加速器,本技术讲座将描述SYCL编译器如何推断这些地址空间,而不需要程序员在其应用程序中指定它们,并描述一些必要的规范演变,以便更好地应对新的SYCL 2020功能。硬件设备可以具有多个具有不同可见性和性能级别的内存区域。与OpenCL C[5]类似,SYCL将它们抽象为对所有工作项可见的全局内存,对单个工作组可见的局部内存,或者仅对单个工作项可见的私有内存。在OpenCL C中,程序员使用类型限定符来表示地址空间,以便对指针寻址的内存区域进行静态编码,从而确保当程序员指定地址空间时,编译器可以检查程序是否格式良好。但是要求使用显式地址空间编写程序是以牺牲可用性为代价的,因为这些需要集成到程序设计中,并且是集成没有考虑到这一点编写的代码的障碍。因此在openclc2中。X /3程序员可以使用未命名的通用地址空间。另一方面,SYCL不扩展c++语言,因此程序员不能使用类型限定符来表示地址空间(因为c++标准没有定义它们)。因此,在SYCL中,指针和引用可以被设备编译器降低到这个未命名的通用地址空间。这个通用地址空间是一个虚拟地址空间,它可以同时表示多个重叠的地址空间。编译器前端不再静态地知道正在寻址的内存,SYCL实现依赖于硬件或软件仿真来正确地将负载和存储分配到正确的内存。在某些硬件目标上,这种灵活性带来了性能成本,但是当编译器可以为给定的内存访问推断单个地址空间时,可以避免这种情况。此外,通常用作SYCL 2020实现后端的低级计算api并不能保证对通用地址空间的支持,例如,它们在OpenCL 3.0中是可选的功能,在Vulkan中不存在。这意味着能够推断大量程序的所有地址空间的SYCL编译器可以获得更好的性能,并针对更广泛的后端计算api。此外,最近为SYCL带来安全关键开发的努力意味着它还需要在Vulkan SC上运行,这使得有一个定义良好的规范来推断地址空间的能力仍然与SYCL相关。SYCL 1.2.1引入的规则对用户代码施加了重要的限制。一个引人注目的例子是“默认规则”:当指针声明没有初始化式时,即使在下一条语句中初始化,也假定指针指向私有内存。因此,在结构体中声明指针时,必须将其默认为私有地址空间。然而,在实践中,这些限制在1.2.1的环境中并不是一个重要的障碍,大型应用程序被移植到SYCL上运行,比如Eigen[3],或者构建新的SYCL- blas[1]或SYCL- dnn[2]。SYCL 2020带来了重大变化,并为用户增加了灵活性。其中包括未命名的通用地址空间和统一共享内存(USM)指针。泛型地址空间允许解除1.2.1所述的限制,使得为2020年和泛型编写的程序不太可能在推理规则限制下编译。USM鼓励使用原始指针而不是访问器容器,因为这很快意味着通过结构体传递这些指针。由于USM指针实际上是在寻址全局内存区域,因此这会与推理规则产生冲突。本演讲将描述一个用于ComputeCpp的实验性编译器,Codeplay的SYCL实现。该编译器采用了一种改进的地址空间推理方法,可以有效地应对SYCL 2020的通用地址空间和统一共享内存(USM)指针等特性。谈话也涵盖了这种方法的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL Acceleration of Quantum Transport Simulations with OpenCL CodePin: An Instrumentation-Based Debug Tool of SYCLomatic An Efficient Approach to Resolving Stack Overflow of SYCL Kernel on Intel® CPUs Ray Tracer based lidar simulation using SYCL
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1