首页 > 最新文献

International Workshop on OpenCL最新文献

英文 中文
FAST: A framework for high-performance medical image computing and visualization FAST:用于高性能医学图像计算和可视化的框架
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456717
E. Smistad
Medical image processing and visualization is often computationally demanding. Ultrasound images are acquired in real-time and needs to be processed at a high framerate with low latency. Computed tomography (CT) and magnetic resonance imaging (MRI) create large three dimensional volumes with sizes up to 512 × 512 × 800 voxels. In digital pathology, whole slide microscopy images can have an extreme image size of up to 200, 000 × 100, 000 pixels, which does not even fit into the memory of most computers. Thus, there is a need for smart data storage, processing and visualization methods to handle medical image data. The development of FAST started in 2014, the goal was to create an open-source framework which made GPU and parallel processing of medical images easy and portable. While there existed popular image processing libraries such as the visualization toolkit (VTK), insight toolkit (ITK) and OpenCV, the GPU processing capabilities were still implemented ad-hoc and often implied copying data back and forth from the GPU and CPU. Thus it was decided to use the new OpenCL API to create a cross-platform framework designed bottom-up with GPU processing at the very core. One of the design goals was to remove the burden of moving data back and forth from different processors and memory spaces from the developer. Instead, the developer requests access to the data on a given processor, and FAST will copy and update data as needed. Now, seven years later FAST version 3.2 is released, it still uses OpenCL 1.2 and OpenGL 3.3 at the core of almost all of its operations. FAST can stream images in real-time from ultrasound scanners, webcameras, Intel’s RealSense depth camera, and read many different formats from disk including medical formats such as DICOM, Metaimage and huge microscopy images stored as tiled image pyramids. FAST uses a processing pipeline concept, meaning that you define a pipeline as multiple processing and visualization steps first, then initiate the processing by executing the pipeline. The advantages of this is that it’s easy to change data sources and processing steps. The same pipeline used to process an ultrasound image on disk, can be used to process a real-time stream of ultrasound images. Today FAST pipelines can be created with C++, Python 3 and even without any programming using simple text files. The pipeline approach also opens up possibilities for load balancing and tuning based on analyzing the pipeline as computational graphs, although this has not yet been implemented. In the last five years or so, deep neural networks have become the standard for almost all image processing tasks. Many high-performance frameworks for deep neural network inference already exist, but have very different APIs and use different formats for storing neural network models. FAST now provides a common API for neural networks with multiple backends such as NVIDIA’s TensorRT, Intel’s OpenVINO and Google’s TensorFlow. This removes the burden of the us
医学图像处理和可视化通常需要大量的计算量。超声图像是实时获取的,需要以低延迟的高帧率进行处理。计算机断层扫描(CT)和磁共振成像(MRI)可以创建尺寸高达512 × 512 × 800体素的大型三维体积。在数字病理学中,整个玻片显微镜图像可以有一个极端的图像大小高达200000 × 100000像素,这甚至不适合大多数计算机的内存。因此,需要智能数据存储、处理和可视化方法来处理医学图像数据。FAST的开发始于2014年,目标是创建一个开源框架,使GPU和并行处理医学图像变得容易和便携。虽然存在流行的图像处理库,如可视化工具包(VTK)、洞察力工具包(ITK)和OpenCV,但GPU处理能力仍然是临时实现的,通常意味着从GPU和CPU来回复制数据。因此,我们决定使用新的OpenCL API来创建一个以GPU处理为核心的自下而上设计的跨平台框架。设计目标之一是消除开发人员在不同处理器和内存空间之间来回移动数据的负担。相反,开发人员请求访问给定处理器上的数据,FAST将根据需要复制和更新数据。现在,7年过去了,FAST 3.2版本发布了,它仍然使用OpenCL 1.2和OpenGL 3.3作为几乎所有操作的核心。FAST可以实时传输来自超声扫描仪、网络摄像头、英特尔RealSense深度摄像头的图像,并从磁盘读取许多不同的格式,包括医疗格式,如DICOM、Metaimage和存储为平纹图像金字塔的巨大显微镜图像。FAST使用处理管道概念,这意味着您首先将管道定义为多个处理和可视化步骤,然后通过执行管道来启动处理。这样做的优点是很容易更改数据源和处理步骤。同样的流水线用于处理磁盘上的超声图像,也可以用于处理实时的超声图像流。如今,FAST管道可以用c++、Python 3创建,甚至不需要使用简单的文本文件进行任何编程。管道方法还提供了基于将管道分析为计算图的负载平衡和调优的可能性,尽管这还没有实现。在过去五年左右的时间里,深度神经网络已经成为几乎所有图像处理任务的标准。目前已经存在许多用于深度神经网络推理的高性能框架,但它们具有非常不同的api,并且使用不同的格式来存储神经网络模型。FAST现在为具有多个后端(如NVIDIA的TensorRT、Intel的OpenVINO和Google的TensorFlow)的神经网络提供了一个通用API。这消除了用户学习每个推理库API的负担,并使神经网络推理像加载存储在磁盘上的模型一样简单。本演讲将介绍FAST框架以及如何使用OpenCL来制作它。在可移植性/易用性/代码复杂性和性能之间的权衡一直是一个挑战,经常导致牺牲性能或不得不编写相同算法的多个版本来处理不同的OpenCL实现。该演讲还将讨论OpenGL的一些重要特性,如OpenGL互操作性和2D/3D图像/纹理。FAST是开源的,我们邀请社区通过https://github.com/smistad/FAST的GitHub做出贡献
{"title":"FAST: A framework for high-performance medical image computing and visualization","authors":"E. Smistad","doi":"10.1145/3456669.3456717","DOIUrl":"https://doi.org/10.1145/3456669.3456717","url":null,"abstract":"Medical image processing and visualization is often computationally demanding. Ultrasound images are acquired in real-time and needs to be processed at a high framerate with low latency. Computed tomography (CT) and magnetic resonance imaging (MRI) create large three dimensional volumes with sizes up to 512 × 512 × 800 voxels. In digital pathology, whole slide microscopy images can have an extreme image size of up to 200, 000 × 100, 000 pixels, which does not even fit into the memory of most computers. Thus, there is a need for smart data storage, processing and visualization methods to handle medical image data. The development of FAST started in 2014, the goal was to create an open-source framework which made GPU and parallel processing of medical images easy and portable. While there existed popular image processing libraries such as the visualization toolkit (VTK), insight toolkit (ITK) and OpenCV, the GPU processing capabilities were still implemented ad-hoc and often implied copying data back and forth from the GPU and CPU. Thus it was decided to use the new OpenCL API to create a cross-platform framework designed bottom-up with GPU processing at the very core. One of the design goals was to remove the burden of moving data back and forth from different processors and memory spaces from the developer. Instead, the developer requests access to the data on a given processor, and FAST will copy and update data as needed. Now, seven years later FAST version 3.2 is released, it still uses OpenCL 1.2 and OpenGL 3.3 at the core of almost all of its operations. FAST can stream images in real-time from ultrasound scanners, webcameras, Intel’s RealSense depth camera, and read many different formats from disk including medical formats such as DICOM, Metaimage and huge microscopy images stored as tiled image pyramids. FAST uses a processing pipeline concept, meaning that you define a pipeline as multiple processing and visualization steps first, then initiate the processing by executing the pipeline. The advantages of this is that it’s easy to change data sources and processing steps. The same pipeline used to process an ultrasound image on disk, can be used to process a real-time stream of ultrasound images. Today FAST pipelines can be created with C++, Python 3 and even without any programming using simple text files. The pipeline approach also opens up possibilities for load balancing and tuning based on analyzing the pipeline as computational graphs, although this has not yet been implemented. In the last five years or so, deep neural networks have become the standard for almost all image processing tasks. Many high-performance frameworks for deep neural network inference already exist, but have very different APIs and use different formats for storing neural network models. FAST now provides a common API for neural networks with multiple backends such as NVIDIA’s TensorRT, Intel’s OpenVINO and Google’s TensorFlow. This removes the burden of the us","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89270950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Experimenting with C++ libraries in OpenCL kernel code 在OpenCL内核代码中实验c++库
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456675
Ole Strohm, Anastasia Stulova
To support full functionality of two compile-time extensions were added that are safe for (only used in metaprogramming) no extra functionality is needed on conformant OpenCL devices [4]. The extensions are required for: Specifying pointers to functions in is_member_function_pointer; Specifying variadic prototypes in result_of, invoke_result, is_invocable, is_nothrow_invocable, is_member_function_pointer. clang -cl-std=clc++ -I/include -DN=10 test.cl
为了支持两个编译时扩展的完整功能,添加了安全的(仅用于元编程),在符合OpenCL设备上不需要额外的功能[4]。在is_member_function_pointer中指定指向函数的指针;在result_of、invoke_result、is_invocable、is_nothrow_invocable、is_member_function_pointer中指定可变原型。clang -cl-std=clc++ -I/include -DN=10 test.cl
{"title":"Experimenting with C++ libraries in OpenCL kernel code","authors":"Ole Strohm, Anastasia Stulova","doi":"10.1145/3456669.3456675","DOIUrl":"https://doi.org/10.1145/3456669.3456675","url":null,"abstract":"To support full functionality of <type_traits> two compile-time extensions were added that are safe for <type_traits> (only used in metaprogramming) no extra functionality is needed on conformant OpenCL devices [4]. The extensions are required for: Specifying pointers to functions in is_member_function_pointer; Specifying variadic prototypes in result_of, invoke_result, is_invocable, is_nothrow_invocable, is_member_function_pointer. clang -cl-std=clc++ -I<path to libcxx>/include -DN=10 test.cl","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87362824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Can SYCL and OpenCL meet the challenges of functional safety? SYCL和OpenCL能否应对功能安全的挑战?
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456688
Rod Burns, Illya Rudkin
Open standards are being looked at as an attractive alternative to proprietary solutions by the automotive domain to enable sensor fusion systems in cheap mass-market vehicles. Open standards specification for SYCL, OpenCL and Vulkan were not always designed with safety in mind, yet they could be at the centre of tomorrows highly critical systems in a vehicle.
开放标准正被汽车领域视为一种有吸引力的替代专有解决方案,使传感器融合系统能够在廉价的大众市场车辆上使用。SYCL、OpenCL和Vulkan的开放标准规范在设计时并不总是考虑到安全性,但它们可能成为未来车辆中高度关键系统的核心。
{"title":"Can SYCL and OpenCL meet the challenges of functional safety?","authors":"Rod Burns, Illya Rudkin","doi":"10.1145/3456669.3456688","DOIUrl":"https://doi.org/10.1145/3456669.3456688","url":null,"abstract":"Open standards are being looked at as an attractive alternative to proprietary solutions by the automotive domain to enable sensor fusion systems in cheap mass-market vehicles. Open standards specification for SYCL, OpenCL and Vulkan were not always designed with safety in mind, yet they could be at the centre of tomorrows highly critical systems in a vehicle.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88842852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hands-On Introduction To SYCL SYCL的动手介绍
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456682
Rod Burns, I. Vorobtsov, Aksel Alpay, R. Keryell, Michael Steyer, Gavin Brown
SYCL is a programming model that lets developers support a wide variety of devices (CPUs, GPUs, and more) from a single code base. Given the growing heterogeneity of processor roadmaps, moving to a platform-independent model such as SYCL is essential for modern software developers. SYCL has the further advantage of supporting a single-source style of programming from completely standard C++. In this tutorial, we will introduce SYCL and provide programmers with a solid foundation they can build on to gain mastery of this language. This is a hands-on tutorial. The real learning will happen as students write code. The format will be short presentations followed by hands-on exercises. Hence, attendees will require their own laptop to perform the hands-on exercises. Topics Covered Include:
SYCL是一种编程模型,它允许开发人员从一个代码库支持各种各样的设备(cpu、gpu等)。考虑到处理器路线图的异构性日益增加,转向与平台无关的模型(如SYCL)对于现代软件开发人员来说至关重要。SYCL的另一个优势是支持完全标准的c++的单源编程风格。在本教程中,我们将介绍SYCL,并为程序员提供一个坚实的基础,他们可以在此基础上构建,以掌握这种语言。这是一个动手教程。真正的学习将发生在学生写代码的时候。课程的形式是简短的演讲,然后是实践练习。因此,与会者将需要自己的笔记本电脑来执行动手练习。涵盖的主题包括:
{"title":"A Hands-On Introduction To SYCL","authors":"Rod Burns, I. Vorobtsov, Aksel Alpay, R. Keryell, Michael Steyer, Gavin Brown","doi":"10.1145/3456669.3456682","DOIUrl":"https://doi.org/10.1145/3456669.3456682","url":null,"abstract":"SYCL is a programming model that lets developers support a wide variety of devices (CPUs, GPUs, and more) from a single code base. Given the growing heterogeneity of processor roadmaps, moving to a platform-independent model such as SYCL is essential for modern software developers. SYCL has the further advantage of supporting a single-source style of programming from completely standard C++. In this tutorial, we will introduce SYCL and provide programmers with a solid foundation they can build on to gain mastery of this language. This is a hands-on tutorial. The real learning will happen as students write code. The format will be short presentations followed by hands-on exercises. Hence, attendees will require their own laptop to perform the hands-on exercises. Topics Covered Include:","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77973354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning training with Tensor Virtual Machine (TVM) and Adreno GPUs 机器学习训练与张量虚拟机(TVM)和Adreno gpu
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456702
Siva Rama Krishna Reddy, Hongqiang Wang, Adarsh Golikeri, Alex Bourd
{"title":"Machine learning training with Tensor Virtual Machine (TVM) and Adreno GPUs","authors":"Siva Rama Krishna Reddy, Hongqiang Wang, Adarsh Golikeri, Alex Bourd","doi":"10.1145/3456669.3456702","DOIUrl":"https://doi.org/10.1145/3456669.3456702","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"29 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72599481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending DPC++ with Support for Huawei Ascend AI Chipset 扩展dpc++,支持华为Ascend AI芯片组
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456684
W. Feng, Rasool Maghareh, Kai-Ting Amy Wang
Heterogeneous computing has emerged as an important method for supporting more than one kind of processors or accelerators in a program. The Khronos SYCL [3] standard defines an abstract programming model for heterogeneous computing. The oneAPI Specification [10] and at its core the DPC++ programming language [9] are built on top of the SYCL standards. In this presentation, we will be reviewing the implementation steps taken to add the support for the Huawei Ascend AI Chipset to DPC++.
异构计算已经成为在一个程序中支持多种处理器或加速器的一种重要方法。Khronos SYCL[3]标准为异构计算定义了一个抽象的编程模型。oneAPI规范[10]及其核心的dpc++编程语言[9]是建立在SYCL标准之上的。在本次演讲中,我们将回顾将华为Ascend AI芯片组支持添加到dpc++所采取的实施步骤。
{"title":"Extending DPC++ with Support for Huawei Ascend AI Chipset","authors":"W. Feng, Rasool Maghareh, Kai-Ting Amy Wang","doi":"10.1145/3456669.3456684","DOIUrl":"https://doi.org/10.1145/3456669.3456684","url":null,"abstract":"Heterogeneous computing has emerged as an important method for supporting more than one kind of processors or accelerators in a program. The Khronos SYCL [3] standard defines an abstract programming model for heterogeneous computing. The oneAPI Specification [10] and at its core the DPC++ programming language [9] are built on top of the SYCL standards. In this presentation, we will be reviewing the implementation steps taken to add the support for the Huawei Ascend AI Chipset to DPC++.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"144 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77534171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Approaching Coupled Cluster Theory with Perturbative Triples using SYCL 用SYCL逼近微扰三元组耦合聚类理论
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456700
Abhishek Bagusetty, Jinsung Kim, Ajay Panyala, Á. Vázquez-Mayagoitia, K. Kowalski, S. Krishnamoorthy
{"title":"Approaching Coupled Cluster Theory with Perturbative Triples using SYCL","authors":"Abhishek Bagusetty, Jinsung Kim, Ajay Panyala, Á. Vázquez-Mayagoitia, K. Kowalski, S. Krishnamoorthy","doi":"10.1145/3456669.3456700","DOIUrl":"https://doi.org/10.1145/3456669.3456700","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78246493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
hipSYCL in 2021: Peculiarities, unique features and SYCL 2020 hipSYCL在2021年:特点,独特的功能和SYCL 2020
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456691
Aksel Alpay, V. Heuveline
{"title":"hipSYCL in 2021: Peculiarities, unique features and SYCL 2020","authors":"Aksel Alpay, V. Heuveline","doi":"10.1145/3456669.3456691","DOIUrl":"https://doi.org/10.1145/3456669.3456691","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78687299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Developing medical ultrasound imaging application across GPU, FPGA, and CPU using oneAPI 使用oneAPI开发跨GPU, FPGA和CPU的医学超声成像应用程序
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456680
Yong Wang, Yongfa Zhou, Q. Wang, Wang Yang, Qing Xu, Chen Wang
The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3]. For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures. oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware. To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU. In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.
超声诊断技术是一项发展迅速的影像学技术,在临床上得到了广泛的应用。典型的超声成像流水线包括以下算法:波束形成、包络检测、日志压缩和扫描转换[1]。在传统的超声成像中,由于其高吞吐量和大量数据处理要求,使用专用集成电路(asic)和fpga实现。随着GPGPU及其编程环境(如CUDA)的发展,研究者使用软件实现超声成像算法[2],[3]。目前,发展超声成像的两个限制因素是:第一,使用硬件开发方法实现超声成像算法复杂、耗时且缺乏灵活性。其次,现有的基于cuda的超声成像实现仅限于Nvidia硬件,这也限制了应用更多架构。oneAPI是英特尔公司开发的跨平台、统一的编程环境。它使用数据并行c++ (Data Parallel c++, dpc++)支持跨多个硬件架构的异构计算。这个新的编程套件可以用来解决上面提到的问题。需要说明的是,使用dpc++等高级语言对FPGA进行编程可以加快超声成像应用程序的开发。基于sycl的超声成像应用程序可以很容易地迁移到其他供应商的硬件。在统一的编程环境中实现跨多个架构(如GPU、FPGA和CPU)的超声成像应用程序。我们迁移了一个基于cuda的开源超声成像项目SUPRA[4]。迁移过程是使用一个api兼容性工具(例如dpct)执行的。迁移之后,代码被调优到可以在GPU、FPGA和CPU上运行。在这次演讲中,我们将讨论将CUDA代码迁移到oneAPI代码的完整过程的经验。首先,将介绍使用dpct迁移CUDA代码库的整个过程,包括使用、代码修改、API比较和构建指令。其次,将分析超声成像算法的计算特性,并展示如何优化在Intel gpu上的应用,包括ESIDM的使用。第三,将强调调整迁移代码到目标FPGA的早期经验,这将包括针对FPGA的设备代码重写和编程技巧,以提高FPGA的性能。并对GPU和FPGA的器件代码进行了比较。最后,我们将比较超声成像算法在不同硬件上的性能和计算结果,包括英特尔GPU(集成GPU和分立GPU)、英特尔Arria 10 FPGA、英特尔CPU、Nvidia GTX 1080 GPU和GTX 960M GPU。
{"title":"Developing medical ultrasound imaging application across GPU, FPGA, and CPU using oneAPI","authors":"Yong Wang, Yongfa Zhou, Q. Wang, Wang Yang, Qing Xu, Chen Wang","doi":"10.1145/3456669.3456680","DOIUrl":"https://doi.org/10.1145/3456669.3456680","url":null,"abstract":"The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3]. For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures. oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware. To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU. In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88930572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bringing SYCL to Ampere architecture 将SYCL引入安培架构
Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456685
Rod Burns, S. Larsen, B. Cook, D. Doerfler, Kevin G. Harms, T. Applencourt, Stuart Adams
{"title":"Bringing SYCL to Ampere architecture","authors":"Rod Burns, S. Larsen, B. Cook, D. Doerfler, Kevin G. Harms, T. Applencourt, Stuart Adams","doi":"10.1145/3456669.3456685","DOIUrl":"https://doi.org/10.1145/3456669.3456685","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87605619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Workshop on OpenCL
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1