Towards performance portability of AI graphs using SYCL

Kumudha Narasimhan, Ouadie El Farouki, M. Goli, Muhammad Tanvir, S. Georgiev, Isaac Ault
{"title":"Towards performance portability of AI graphs using SYCL","authors":"Kumudha Narasimhan, Ouadie El Farouki, M. Goli, Muhammad Tanvir, S. Georgiev, Isaac Ault","doi":"10.1109/P3HPC56579.2022.00016","DOIUrl":null,"url":null,"abstract":"The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers.While the proposed ONNX as a de facto for DNN model description, provides portability across various AI frameworks, supporting DNN models on various hardware architectures remains challenging.SYCL provides a C++-based portable parallel programming model to target various devices. Thus, enabling SYCL backend for an AI framework can lead to a hardware-agnostic model for heterogeneous systems.This paper proposes a SYCL backend for ONNXRuntime as a possible solution towards the performance portability of deep learning algorithms. The proposed backend uses existing state-of-the-art SYCL-DNN and SYCL-BLAS libraries to invoke tuned SYCL kernels for DNN operations. Our performance evaluation shows that the proposed approach can achieve comparable performance with respect to the state-of-the-art optimized vendor-specific libraries.","PeriodicalId":261766,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/P3HPC56579.2022.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers.While the proposed ONNX as a de facto for DNN model description, provides portability across various AI frameworks, supporting DNN models on various hardware architectures remains challenging.SYCL provides a C++-based portable parallel programming model to target various devices. Thus, enabling SYCL backend for an AI framework can lead to a hardware-agnostic model for heterogeneous systems.This paper proposes a SYCL backend for ONNXRuntime as a possible solution towards the performance portability of deep learning algorithms. The proposed backend uses existing state-of-the-art SYCL-DNN and SYCL-BLAS libraries to invoke tuned SYCL kernels for DNN operations. Our performance evaluation shows that the proposed approach can achieve comparable performance with respect to the state-of-the-art optimized vendor-specific libraries.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用SYCL实现AI图的性能可移植性
深度神经网络(DNN)的广泛采用激励了设计和制造强大而专业的硬件技术,目标是从边缘设备到云和超级计算机的系统。虽然提议的ONNX作为DNN模型描述的事实,提供了跨各种AI框架的可移植性,但在各种硬件架构上支持DNN模型仍然具有挑战性。SYCL提供了一种基于c++的可移植并行编程模型来针对各种设备。因此,为AI框架启用SYCL后端可以导致异构系统的硬件无关模型。本文提出了一个用于ONNXRuntime的SYCL后端,作为实现深度学习算法性能可移植性的可能解决方案。提议的后端使用现有的最先进的SYCL-DNN和SYCL- blas库来为DNN操作调用调优的SYCL内核。我们的性能评估表明,所建议的方法可以达到与最先进的优化供应商特定库相当的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Understanding Strong Scaling on GPUs Using Empirical Performance Saturation Size Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs Performance portable Vlasov code with C++ parallel algorithm Leveraging Compiler-Based Translation to Evaluate a Diversity of Exascale Platforms Heterogeneous Programming for the Homogeneous Majority
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1