{"title":"改进了SYCL程序的地址空间推断","authors":"Ross Brunton, V. Lomüller","doi":"10.1145/3529538.3529998","DOIUrl":null,"url":null,"abstract":"SYCL[4, 6] is a single source C++ based programming model for heterogeneous programming. It enables the programmer to write or port code targeting heterogeneous accelerators using what appears to the programmer as standard C++. To achieve peak performance, however, it can be necessary to write the code in a form which allows the compiler to target specific hardware features. If the compiler can target these hardware features without requiring the programmer to consider them, then productivity and application performance can both be improved. One such example is accelerators with multiple address spaces, this technical talk will describe how a SYCL compiler can infer these address spaces without requiring the programmer to specify them in their application as well as describe some required specification evolution in order to better cope with the new SYCL 2020 features. Hardware devices can have multiple memory regions with different levels of visibility and performance. Similar to OpenCL C[5], SYCL abstracts them into a global memory visible to all work-items, a local memory visible to a single work-group, or a private memory only visible to a single work-item. In OpenCL C, the programmer expresses address spaces using type qualifiers in order to statically encode the memory region addressed by pointers thus ensuring that when a programmer does specify an address space the compiler can check whether the program is well-formed. But requiring programs to be written with explicit address spaces comes at the expense of usability, as these need to be integrated into the program design and are a barrier to integrate code not written with this in mind. Thus in OpenCL C 2.x/3 programmers can make use of the unnamed generic address space instead. On the other hand, SYCL does not extend the C++ language therefore programmers cannot express address spaces using a type qualifier (as the C++ standard does not define them). Thus in SYCL pointers and references can be lowered to this unnamed generic address space by the device compiler. This generic address space is a virtual address space that can represent several overlapping address spaces at the same time. The memory being addressed is no longer statically known by the compiler frontend and the SYCL implementation relies on the hardware, or software emulation, to correctly dispatch the loads and stores to the correct memory. On some hardware targets this flexibility comes with a performance cost, but this can be avoided when the compiler can infer a single address space for a given memory access. Additionally, the low-level compute APIs that are often used as backends to a SYCL 2020 implementation do not guarantee support for a generic address space, e.g. they are an optional feature in OpenCL 3.0 and non-existent in Vulkan. This means that a SYCL compiler that can infer all address spaces for a large set of programs can achieve better performance and target a wider range of backend compute APIs. Moreover, recent efforts to bring safety critical development to SYCL means it will also need to run on top of Vulkan SC. This makes the ability to have a well-defined specification for inferring address spaces still relevant for SYCL. The rules introduced by SYCL 1.2.1 impose significant restrictions on user code. One striking example is the ”defaulting rule”: when a pointer declaration has no initializer, the pointer is assumed to address the private memory, even if it is initialized in the very next statement. As a consequence, you cannot declare a pointer in a structure without it defaulting to the private address space. In practice, however, these restrictions are not a significant barrier in the context of 1.2.1 and large applications were ported to run with SYCL such as Eigen[3] or build new ones like SYCL-BLAS[1] or SYCL-DNN[2]. SYCL 2020 brought significant changes and added flexibility to users. Among them are the unnamed generic address space and unified shared memory (USM) pointers. The generic address space allowed to lift the restrictions stated by 1.2.1, making programs written for 2020 and generic unlikely to be compilable under the inference rules restriction. USM encourages the usage of raw pointers instead of the accessors container as this quickly implies passing these pointers via structures. As a USM pointer is in fact addressing the global memory region, this creates a conflict with inference rules. This talk will describe an experimental compiler for ComputeCpp, Codeplay’s SYCL implementation. This compiler employs an improved address space inference method that can efficiently cope with SYCL 2020 features such as the generic address space and unified shared memory (USM) pointers. The talk with also cover the limitations of this approach.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"136 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved address space inference for SYCL programs\",\"authors\":\"Ross Brunton, V. Lomüller\",\"doi\":\"10.1145/3529538.3529998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SYCL[4, 6] is a single source C++ based programming model for heterogeneous programming. It enables the programmer to write or port code targeting heterogeneous accelerators using what appears to the programmer as standard C++. To achieve peak performance, however, it can be necessary to write the code in a form which allows the compiler to target specific hardware features. If the compiler can target these hardware features without requiring the programmer to consider them, then productivity and application performance can both be improved. One such example is accelerators with multiple address spaces, this technical talk will describe how a SYCL compiler can infer these address spaces without requiring the programmer to specify them in their application as well as describe some required specification evolution in order to better cope with the new SYCL 2020 features. Hardware devices can have multiple memory regions with different levels of visibility and performance. Similar to OpenCL C[5], SYCL abstracts them into a global memory visible to all work-items, a local memory visible to a single work-group, or a private memory only visible to a single work-item. In OpenCL C, the programmer expresses address spaces using type qualifiers in order to statically encode the memory region addressed by pointers thus ensuring that when a programmer does specify an address space the compiler can check whether the program is well-formed. But requiring programs to be written with explicit address spaces comes at the expense of usability, as these need to be integrated into the program design and are a barrier to integrate code not written with this in mind. Thus in OpenCL C 2.x/3 programmers can make use of the unnamed generic address space instead. On the other hand, SYCL does not extend the C++ language therefore programmers cannot express address spaces using a type qualifier (as the C++ standard does not define them). Thus in SYCL pointers and references can be lowered to this unnamed generic address space by the device compiler. This generic address space is a virtual address space that can represent several overlapping address spaces at the same time. The memory being addressed is no longer statically known by the compiler frontend and the SYCL implementation relies on the hardware, or software emulation, to correctly dispatch the loads and stores to the correct memory. On some hardware targets this flexibility comes with a performance cost, but this can be avoided when the compiler can infer a single address space for a given memory access. Additionally, the low-level compute APIs that are often used as backends to a SYCL 2020 implementation do not guarantee support for a generic address space, e.g. they are an optional feature in OpenCL 3.0 and non-existent in Vulkan. This means that a SYCL compiler that can infer all address spaces for a large set of programs can achieve better performance and target a wider range of backend compute APIs. Moreover, recent efforts to bring safety critical development to SYCL means it will also need to run on top of Vulkan SC. This makes the ability to have a well-defined specification for inferring address spaces still relevant for SYCL. The rules introduced by SYCL 1.2.1 impose significant restrictions on user code. One striking example is the ”defaulting rule”: when a pointer declaration has no initializer, the pointer is assumed to address the private memory, even if it is initialized in the very next statement. As a consequence, you cannot declare a pointer in a structure without it defaulting to the private address space. In practice, however, these restrictions are not a significant barrier in the context of 1.2.1 and large applications were ported to run with SYCL such as Eigen[3] or build new ones like SYCL-BLAS[1] or SYCL-DNN[2]. SYCL 2020 brought significant changes and added flexibility to users. Among them are the unnamed generic address space and unified shared memory (USM) pointers. The generic address space allowed to lift the restrictions stated by 1.2.1, making programs written for 2020 and generic unlikely to be compilable under the inference rules restriction. USM encourages the usage of raw pointers instead of the accessors container as this quickly implies passing these pointers via structures. As a USM pointer is in fact addressing the global memory region, this creates a conflict with inference rules. This talk will describe an experimental compiler for ComputeCpp, Codeplay’s SYCL implementation. This compiler employs an improved address space inference method that can efficiently cope with SYCL 2020 features such as the generic address space and unified shared memory (USM) pointers. The talk with also cover the limitations of this approach.\",\"PeriodicalId\":73497,\"journal\":{\"name\":\"International Workshop on OpenCL\",\"volume\":\"136 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on OpenCL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529538.3529998\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved address space inference for SYCL programs
SYCL[4, 6] is a single source C++ based programming model for heterogeneous programming. It enables the programmer to write or port code targeting heterogeneous accelerators using what appears to the programmer as standard C++. To achieve peak performance, however, it can be necessary to write the code in a form which allows the compiler to target specific hardware features. If the compiler can target these hardware features without requiring the programmer to consider them, then productivity and application performance can both be improved. One such example is accelerators with multiple address spaces, this technical talk will describe how a SYCL compiler can infer these address spaces without requiring the programmer to specify them in their application as well as describe some required specification evolution in order to better cope with the new SYCL 2020 features. Hardware devices can have multiple memory regions with different levels of visibility and performance. Similar to OpenCL C[5], SYCL abstracts them into a global memory visible to all work-items, a local memory visible to a single work-group, or a private memory only visible to a single work-item. In OpenCL C, the programmer expresses address spaces using type qualifiers in order to statically encode the memory region addressed by pointers thus ensuring that when a programmer does specify an address space the compiler can check whether the program is well-formed. But requiring programs to be written with explicit address spaces comes at the expense of usability, as these need to be integrated into the program design and are a barrier to integrate code not written with this in mind. Thus in OpenCL C 2.x/3 programmers can make use of the unnamed generic address space instead. On the other hand, SYCL does not extend the C++ language therefore programmers cannot express address spaces using a type qualifier (as the C++ standard does not define them). Thus in SYCL pointers and references can be lowered to this unnamed generic address space by the device compiler. This generic address space is a virtual address space that can represent several overlapping address spaces at the same time. The memory being addressed is no longer statically known by the compiler frontend and the SYCL implementation relies on the hardware, or software emulation, to correctly dispatch the loads and stores to the correct memory. On some hardware targets this flexibility comes with a performance cost, but this can be avoided when the compiler can infer a single address space for a given memory access. Additionally, the low-level compute APIs that are often used as backends to a SYCL 2020 implementation do not guarantee support for a generic address space, e.g. they are an optional feature in OpenCL 3.0 and non-existent in Vulkan. This means that a SYCL compiler that can infer all address spaces for a large set of programs can achieve better performance and target a wider range of backend compute APIs. Moreover, recent efforts to bring safety critical development to SYCL means it will also need to run on top of Vulkan SC. This makes the ability to have a well-defined specification for inferring address spaces still relevant for SYCL. The rules introduced by SYCL 1.2.1 impose significant restrictions on user code. One striking example is the ”defaulting rule”: when a pointer declaration has no initializer, the pointer is assumed to address the private memory, even if it is initialized in the very next statement. As a consequence, you cannot declare a pointer in a structure without it defaulting to the private address space. In practice, however, these restrictions are not a significant barrier in the context of 1.2.1 and large applications were ported to run with SYCL such as Eigen[3] or build new ones like SYCL-BLAS[1] or SYCL-DNN[2]. SYCL 2020 brought significant changes and added flexibility to users. Among them are the unnamed generic address space and unified shared memory (USM) pointers. The generic address space allowed to lift the restrictions stated by 1.2.1, making programs written for 2020 and generic unlikely to be compilable under the inference rules restriction. USM encourages the usage of raw pointers instead of the accessors container as this quickly implies passing these pointers via structures. As a USM pointer is in fact addressing the global memory region, this creates a conflict with inference rules. This talk will describe an experimental compiler for ComputeCpp, Codeplay’s SYCL implementation. This compiler employs an improved address space inference method that can efficiently cope with SYCL 2020 features such as the generic address space and unified shared memory (USM) pointers. The talk with also cover the limitations of this approach.