Towards performance portability of AI graphs using SYCL

2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) Pub Date : 2022-11-01 DOI:10.1109/P3HPC56579.2022.00016

Kumudha Narasimhan, Ouadie El Farouki, M. Goli, Muhammad Tanvir, S. Georgiev, Isaac Ault

{"title":"Towards performance portability of AI graphs using SYCL","authors":"Kumudha Narasimhan, Ouadie El Farouki, M. Goli, Muhammad Tanvir, S. Georgiev, Isaac Ault","doi":"10.1109/P3HPC56579.2022.00016","DOIUrl":null,"url":null,"abstract":"The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers.While the proposed ONNX as a de facto for DNN model description, provides portability across various AI frameworks, supporting DNN models on various hardware architectures remains challenging.SYCL provides a C++-based portable parallel programming model to target various devices. Thus, enabling SYCL backend for an AI framework can lead to a hardware-agnostic model for heterogeneous systems.This paper proposes a SYCL backend for ONNXRuntime as a possible solution towards the performance portability of deep learning algorithms. The proposed backend uses existing state-of-the-art SYCL-DNN and SYCL-BLAS libraries to invoke tuned SYCL kernels for DNN operations. Our performance evaluation shows that the proposed approach can achieve comparable performance with respect to the state-of-the-art optimized vendor-specific libraries.","PeriodicalId":261766,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/P3HPC56579.2022.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers.While the proposed ONNX as a de facto for DNN model description, provides portability across various AI frameworks, supporting DNN models on various hardware architectures remains challenging.SYCL provides a C++-based portable parallel programming model to target various devices. Thus, enabling SYCL backend for an AI framework can lead to a hardware-agnostic model for heterogeneous systems.This paper proposes a SYCL backend for ONNXRuntime as a possible solution towards the performance portability of deep learning algorithms. The proposed backend uses existing state-of-the-art SYCL-DNN and SYCL-BLAS libraries to invoke tuned SYCL kernels for DNN operations. Our performance evaluation shows that the proposed approach can achieve comparable performance with respect to the state-of-the-art optimized vendor-specific libraries.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用SYCL实现AI图的性能可移植性

深度神经网络(DNN)的广泛采用激励了设计和制造强大而专业的硬件技术，目标是从边缘设备到云和超级计算机的系统。虽然提议的ONNX作为DNN模型描述的事实，提供了跨各种AI框架的可移植性，但在各种硬件架构上支持DNN模型仍然具有挑战性。SYCL提供了一种基于c++的可移植并行编程模型来针对各种设备。因此，为AI框架启用SYCL后端可以导致异构系统的硬件无关模型。本文提出了一个用于ONNXRuntime的SYCL后端，作为实现深度学习算法性能可移植性的可能解决方案。提议的后端使用现有的最先进的SYCL-DNN和SYCL- blas库来为DNN操作调用调优的SYCL内核。我们的性能评估表明，所建议的方法可以达到与最先进的优化供应商特定库相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)

自引率

0.00%

发文量