To support full functionality of two compile-time extensions were added that are safe for (only used in metaprogramming) no extra functionality is needed on conformant OpenCL devices [4]. The extensions are required for: Specifying pointers to functions in is_member_function_pointer; Specifying variadic prototypes in result_of, invoke_result, is_invocable, is_nothrow_invocable, is_member_function_pointer. clang -cl-std=clc++ -I/include -DN=10 test.cl
{"title":"Experimenting with C++ libraries in OpenCL kernel code","authors":"Ole Strohm, Anastasia Stulova","doi":"10.1145/3456669.3456675","DOIUrl":"https://doi.org/10.1145/3456669.3456675","url":null,"abstract":"To support full functionality of <type_traits> two compile-time extensions were added that are safe for <type_traits> (only used in metaprogramming) no extra functionality is needed on conformant OpenCL devices [4]. The extensions are required for: Specifying pointers to functions in is_member_function_pointer; Specifying variadic prototypes in result_of, invoke_result, is_invocable, is_nothrow_invocable, is_member_function_pointer. clang -cl-std=clc++ -I<path to libcxx>/include -DN=10 test.cl","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87362824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trip down the compute pipeline","authors":"Łukasz Towarek","doi":"10.1145/3456669.3456676","DOIUrl":"https://doi.org/10.1145/3456669.3456676","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84263255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rod Burns, I. Vorobtsov, Aksel Alpay, R. Keryell, Michael Steyer, Gavin Brown
SYCL is a programming model that lets developers support a wide variety of devices (CPUs, GPUs, and more) from a single code base. Given the growing heterogeneity of processor roadmaps, moving to a platform-independent model such as SYCL is essential for modern software developers. SYCL has the further advantage of supporting a single-source style of programming from completely standard C++. In this tutorial, we will introduce SYCL and provide programmers with a solid foundation they can build on to gain mastery of this language. This is a hands-on tutorial. The real learning will happen as students write code. The format will be short presentations followed by hands-on exercises. Hence, attendees will require their own laptop to perform the hands-on exercises. Topics Covered Include:
{"title":"A Hands-On Introduction To SYCL","authors":"Rod Burns, I. Vorobtsov, Aksel Alpay, R. Keryell, Michael Steyer, Gavin Brown","doi":"10.1145/3456669.3456682","DOIUrl":"https://doi.org/10.1145/3456669.3456682","url":null,"abstract":"SYCL is a programming model that lets developers support a wide variety of devices (CPUs, GPUs, and more) from a single code base. Given the growing heterogeneity of processor roadmaps, moving to a platform-independent model such as SYCL is essential for modern software developers. SYCL has the further advantage of supporting a single-source style of programming from completely standard C++. In this tutorial, we will introduce SYCL and provide programmers with a solid foundation they can build on to gain mastery of this language. This is a hands-on tutorial. The real learning will happen as students write code. The format will be short presentations followed by hands-on exercises. Hence, attendees will require their own laptop to perform the hands-on exercises. Topics Covered Include:","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77973354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Open standards are being looked at as an attractive alternative to proprietary solutions by the automotive domain to enable sensor fusion systems in cheap mass-market vehicles. Open standards specification for SYCL, OpenCL and Vulkan were not always designed with safety in mind, yet they could be at the centre of tomorrows highly critical systems in a vehicle.
{"title":"Can SYCL and OpenCL meet the challenges of functional safety?","authors":"Rod Burns, Illya Rudkin","doi":"10.1145/3456669.3456688","DOIUrl":"https://doi.org/10.1145/3456669.3456688","url":null,"abstract":"Open standards are being looked at as an attractive alternative to proprietary solutions by the automotive domain to enable sensor fusion systems in cheap mass-market vehicles. Open standards specification for SYCL, OpenCL and Vulkan were not always designed with safety in mind, yet they could be at the centre of tomorrows highly critical systems in a vehicle.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88842852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous computing has emerged as an important method for supporting more than one kind of processors or accelerators in a program. The Khronos SYCL [3] standard defines an abstract programming model for heterogeneous computing. The oneAPI Specification [10] and at its core the DPC++ programming language [9] are built on top of the SYCL standards. In this presentation, we will be reviewing the implementation steps taken to add the support for the Huawei Ascend AI Chipset to DPC++.
{"title":"Extending DPC++ with Support for Huawei Ascend AI Chipset","authors":"W. Feng, Rasool Maghareh, Kai-Ting Amy Wang","doi":"10.1145/3456669.3456684","DOIUrl":"https://doi.org/10.1145/3456669.3456684","url":null,"abstract":"Heterogeneous computing has emerged as an important method for supporting more than one kind of processors or accelerators in a program. The Khronos SYCL [3] standard defines an abstract programming model for heterogeneous computing. The oneAPI Specification [10] and at its core the DPC++ programming language [9] are built on top of the SYCL standards. In this presentation, we will be reviewing the implementation steps taken to add the support for the Huawei Ascend AI Chipset to DPC++.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"144 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77534171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siva Rama Krishna Reddy, Hongqiang Wang, Adarsh Golikeri, Alex Bourd
{"title":"Machine learning training with Tensor Virtual Machine (TVM) and Adreno GPUs","authors":"Siva Rama Krishna Reddy, Hongqiang Wang, Adarsh Golikeri, Alex Bourd","doi":"10.1145/3456669.3456702","DOIUrl":"https://doi.org/10.1145/3456669.3456702","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"29 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72599481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhishek Bagusetty, Jinsung Kim, Ajay Panyala, Á. Vázquez-Mayagoitia, K. Kowalski, S. Krishnamoorthy
{"title":"Approaching Coupled Cluster Theory with Perturbative Triples using SYCL","authors":"Abhishek Bagusetty, Jinsung Kim, Ajay Panyala, Á. Vázquez-Mayagoitia, K. Kowalski, S. Krishnamoorthy","doi":"10.1145/3456669.3456700","DOIUrl":"https://doi.org/10.1145/3456669.3456700","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78246493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"hipSYCL in 2021: Peculiarities, unique features and SYCL 2020","authors":"Aksel Alpay, V. Heuveline","doi":"10.1145/3456669.3456691","DOIUrl":"https://doi.org/10.1145/3456669.3456691","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78687299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Wang, Yongfa Zhou, Q. Wang, Wang Yang, Qing Xu, Chen Wang
The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3]. For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures. oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware. To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU. In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.
{"title":"Developing medical ultrasound imaging application across GPU, FPGA, and CPU using oneAPI","authors":"Yong Wang, Yongfa Zhou, Q. Wang, Wang Yang, Qing Xu, Chen Wang","doi":"10.1145/3456669.3456680","DOIUrl":"https://doi.org/10.1145/3456669.3456680","url":null,"abstract":"The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3]. For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures. oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware. To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU. In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88930572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rod Burns, S. Larsen, B. Cook, D. Doerfler, Kevin G. Harms, T. Applencourt, Stuart Adams
{"title":"Bringing SYCL to Ampere architecture","authors":"Rod Burns, S. Larsen, B. Cook, D. Doerfler, Kevin G. Harms, T. Applencourt, Stuart Adams","doi":"10.1145/3456669.3456685","DOIUrl":"https://doi.org/10.1145/3456669.3456685","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87605619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}