On the Compilation Performance of Current SYCL Implementations

Peter Thoman, Facundo Molina Heredia, T. Fahringer
{"title":"On the Compilation Performance of Current SYCL Implementations","authors":"Peter Thoman, Facundo Molina Heredia, T. Fahringer","doi":"10.1145/3529538.3529548","DOIUrl":null,"url":null,"abstract":"The Khronos SYCL abstraction layer is designed to enable programming heterogeneous platforms, consisting of host and accelerator devices, with a single-source code base. In order to allow for a high level of abstraction while still providing competitive runtime performance, both SYCL implementations and the software ecosystems built around SYCL applications frequently make heavy use of C++ templates. A potential consequence of this design choice, as well as the need to generate code for both a host and at least one device architecture, are significant compilation times. In this work we set out to study the relative compile-time performance and the impact of various SYCL features on compilation times across a selection of the most widely-used SYCL implementations. To this end, we introduce a code generator which creates SYCL kernels stressing various API features and instruction types, either in isolation or in combination, as well as an infrastructure to largely automate related experiments. We apply this infrastructure in a large-scale synthetic evaluation totaling 96000 compiler runs, which also includes a study of the compilation performance over time of the most widespread implementations. In addition to these synthetic experiments, we validate the applicability of our findings by measuring the compile times of two real-world industrial SYCL applications. On the basis of these experiments, we point out particularly impactful – in terms of compile-time performance – changes during the development of some SYCL implementations, and formulate suggestions for SYCL implementation developers as well as users. We have made both the code generator and all the tools we developed to carry out the experiments in this paper available as open source.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The Khronos SYCL abstraction layer is designed to enable programming heterogeneous platforms, consisting of host and accelerator devices, with a single-source code base. In order to allow for a high level of abstraction while still providing competitive runtime performance, both SYCL implementations and the software ecosystems built around SYCL applications frequently make heavy use of C++ templates. A potential consequence of this design choice, as well as the need to generate code for both a host and at least one device architecture, are significant compilation times. In this work we set out to study the relative compile-time performance and the impact of various SYCL features on compilation times across a selection of the most widely-used SYCL implementations. To this end, we introduce a code generator which creates SYCL kernels stressing various API features and instruction types, either in isolation or in combination, as well as an infrastructure to largely automate related experiments. We apply this infrastructure in a large-scale synthetic evaluation totaling 96000 compiler runs, which also includes a study of the compilation performance over time of the most widespread implementations. In addition to these synthetic experiments, we validate the applicability of our findings by measuring the compile times of two real-world industrial SYCL applications. On the basis of these experiments, we point out particularly impactful – in terms of compile-time performance – changes during the development of some SYCL implementations, and formulate suggestions for SYCL implementation developers as well as users. We have made both the code generator and all the tools we developed to carry out the experiments in this paper available as open source.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
论当前SYCL实现的编译性能
Khronos SYCL抽象层的设计目的是支持使用单一源代码库对由主机和加速器设备组成的异构平台进行编程。为了在提供有竞争力的运行时性能的同时提供高层次的抽象,SYCL实现和围绕SYCL应用程序构建的软件生态系统都经常大量使用c++模板。这种设计选择的潜在后果,以及需要为主机和至少一个设备体系结构生成代码,是大量的编译时间。在这项工作中,我们开始研究相对的编译时性能,以及各种SYCL特性对编译时间的影响,选择了最广泛使用的SYCL实现。为此,我们介绍了一个代码生成器,它创建SYCL内核,强调各种API特性和指令类型,无论是单独的还是组合的,以及一个基本的自动化相关实验的基础设施。我们将此基础架构应用于一个总计96000个编译器运行的大规模综合评估,其中还包括对最广泛实现的编译性能随时间变化的研究。除了这些综合实验之外,我们还通过测量两个实际工业SYCL应用程序的编译时间来验证我们的发现的适用性。在这些实验的基础上,我们指出了一些SYCL实现在开发过程中对编译时性能影响特别大的变化,并为SYCL实现开发者和用户提出了建议。我们已经将代码生成器和我们开发的用于执行本文中实验的所有工具作为开源提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL Acceleration of Quantum Transport Simulations with OpenCL CodePin: An Instrumentation-Based Debug Tool of SYCLomatic An Efficient Approach to Resolving Stack Overflow of SYCL Kernel on Intel® CPUs Ray Tracer based lidar simulation using SYCL
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1