SYCL、OpenCL、CUDA和OpenMP在多厂商硬件上大规模并行支持向量机分类的比较

International Workshop on OpenCL Pub Date : 2022-05-10 DOI:10.1145/3529538.3529980

Marcel Breyer, Alexander Van Craen, D. Pflüger

{"title":"SYCL、OpenCL、CUDA和OpenMP在多厂商硬件上大规模并行支持向量机分类的比较","authors":"Marcel Breyer, Alexander Van Craen, D. Pflüger","doi":"10.1145/3529538.3529980","DOIUrl":null,"url":null,"abstract":"In scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational power of accelerator cards, in particular of Graphics Processing Units (GPUs). A few years ago, GPUs from NVIDIA were used almost exclusively for these tasks but meanwhile, AMD and Intel are increasing their shares of the GPUs market. This introduces many new challenges for code development, as the prevailing CUDA code can only run on NVIDIA hardware and must be adapted or even completely rewritten to run on GPUs from AMD or Intel. In this paper, we compare the different competing programming frameworks OpenMP, CUDA, OpenCL, and SYCL, paying special attention to the two SYCL implementations hipSYCL and DPC++. Thereby, we investigate the different frameworks with respect to their usability, performance, and performance portability on a variety of hardware platforms from different vendors, i.e., GPUs from NVIDIA, AMD, and Intel and Central Processing Units (CPUs) from AMD and Intel. Besides discussing the runtimes of these frameworks on the different hardware platforms, we also focus our comparison on the differences between the nd_range kernel formulation and the SYCL specific hierarchical kernels. Our Parallel Least Squares Support Vector Machine (PLSSVM) library implements backends for the four previously mentioned programming frameworks for a Least Squares Support Vector Machines (LS-SVMs). At its example, we show which of the frameworks is best suited for a standard workload that is frequently employed in scientific computing and AI, depending on the target hardware: The most computationally intensive part of our PLSSVM library is solving a system of linear equations using the Conjugate Gradient (CG) method. Specifically, we parallelize the implicit matrix-vector multiplication inside the CG method, a workload common in many scientific codes. The PLSSVM code, utility scripts, and documentation are all available on GitHub: https://github.com/SC-SGS/PLSSVM.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware\",\"authors\":\"Marcel Breyer, Alexander Van Craen, D. Pflüger\",\"doi\":\"10.1145/3529538.3529980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational power of accelerator cards, in particular of Graphics Processing Units (GPUs). A few years ago, GPUs from NVIDIA were used almost exclusively for these tasks but meanwhile, AMD and Intel are increasing their shares of the GPUs market. This introduces many new challenges for code development, as the prevailing CUDA code can only run on NVIDIA hardware and must be adapted or even completely rewritten to run on GPUs from AMD or Intel. In this paper, we compare the different competing programming frameworks OpenMP, CUDA, OpenCL, and SYCL, paying special attention to the two SYCL implementations hipSYCL and DPC++. Thereby, we investigate the different frameworks with respect to their usability, performance, and performance portability on a variety of hardware platforms from different vendors, i.e., GPUs from NVIDIA, AMD, and Intel and Central Processing Units (CPUs) from AMD and Intel. Besides discussing the runtimes of these frameworks on the different hardware platforms, we also focus our comparison on the differences between the nd_range kernel formulation and the SYCL specific hierarchical kernels. Our Parallel Least Squares Support Vector Machine (PLSSVM) library implements backends for the four previously mentioned programming frameworks for a Least Squares Support Vector Machines (LS-SVMs). At its example, we show which of the frameworks is best suited for a standard workload that is frequently employed in scientific computing and AI, depending on the target hardware: The most computationally intensive part of our PLSSVM library is solving a system of linear equations using the Conjugate Gradient (CG) method. Specifically, we parallelize the implicit matrix-vector multiplication inside the CG method, a workload common in many scientific codes. The PLSSVM code, utility scripts, and documentation are all available on GitHub: https://github.com/SC-SGS/PLSSVM.\",\"PeriodicalId\":73497,\"journal\":{\"name\":\"International Workshop on OpenCL\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on OpenCL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529538.3529980\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在科学计算和人工智能(AI)中，它们都依赖于大规模并行任务，像计算统一设备架构(CUDA)和开放计算语言(OpenCL)这样的框架被广泛用于获取加速卡的计算能力，特别是图形处理单元(gpu)。几年前，NVIDIA的gpu几乎专门用于这些任务，但与此同时，AMD和英特尔正在增加它们在gpu市场的份额。这给代码开发带来了许多新的挑战，因为流行的CUDA代码只能在NVIDIA硬件上运行，并且必须调整甚至完全重写才能在AMD或英特尔的gpu上运行。在本文中，我们比较了不同的竞争编程框架OpenMP, CUDA, OpenCL和SYCL，特别关注了两种SYCL实现hipSYCL和dpc++。因此，我们研究了不同框架在不同硬件平台上的可用性、性能和性能可移植性，即NVIDIA、AMD和Intel的gpu以及AMD和Intel的中央处理器(cpu)。除了讨论这些框架在不同硬件平台上的运行时外，我们还重点比较了nd_range内核公式与SYCL特定层次结构内核之间的差异。我们的并行最小二乘支持向量机(PLSSVM)库实现了前面提到的四个最小二乘支持向量机(ls - svm)编程框架的后端。在其示例中，我们展示了哪个框架最适合科学计算和人工智能中经常使用的标准工作负载，具体取决于目标硬件:我们PLSSVM库中计算最密集的部分是使用共轭梯度(CG)方法求解线性方程组。具体来说，我们在CG方法中并行化隐式矩阵向量乘法，这是许多科学代码中常见的工作量。PLSSVM代码、实用程序脚本和文档都可以在GitHub上获得:https://github.com/SC-SGS/PLSSVM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware

In scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational power of accelerator cards, in particular of Graphics Processing Units (GPUs). A few years ago, GPUs from NVIDIA were used almost exclusively for these tasks but meanwhile, AMD and Intel are increasing their shares of the GPUs market. This introduces many new challenges for code development, as the prevailing CUDA code can only run on NVIDIA hardware and must be adapted or even completely rewritten to run on GPUs from AMD or Intel. In this paper, we compare the different competing programming frameworks OpenMP, CUDA, OpenCL, and SYCL, paying special attention to the two SYCL implementations hipSYCL and DPC++. Thereby, we investigate the different frameworks with respect to their usability, performance, and performance portability on a variety of hardware platforms from different vendors, i.e., GPUs from NVIDIA, AMD, and Intel and Central Processing Units (CPUs) from AMD and Intel. Besides discussing the runtimes of these frameworks on the different hardware platforms, we also focus our comparison on the differences between the nd_range kernel formulation and the SYCL specific hierarchical kernels. Our Parallel Least Squares Support Vector Machine (PLSSVM) library implements backends for the four previously mentioned programming frameworks for a Least Squares Support Vector Machines (LS-SVMs). At its example, we show which of the frameworks is best suited for a standard workload that is frequently employed in scientific computing and AI, depending on the target hardware: The most computationally intensive part of our PLSSVM library is solving a system of linear equations using the Conjugate Gradient (CG) method. Specifically, we parallelize the implicit matrix-vector multiplication inside the CG method, a workload common in many scientific codes. The PLSSVM code, utility scripts, and documentation are all available on GitHub: https://github.com/SC-SGS/PLSSVM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on OpenCL

自引率

0.00%

发文量

期刊最新文献

Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL Acceleration of Quantum Transport Simulations with OpenCL CodePin: An Instrumentation-Based Debug Tool of SYCLomatic An Efficient Approach to Resolving Stack Overflow of SYCL Kernel on Intel® CPUs Ray Tracer based lidar simulation using SYCL