How to optimize Compute Drivers? Let’s start with writing good benchmarks!

Michał Mrozek
{"title":"How to optimize Compute Drivers? Let’s start with writing good benchmarks!","authors":"Michał Mrozek","doi":"10.1145/3529538.3529569","DOIUrl":null,"url":null,"abstract":"Writing efficient driver stack is the goal of every driver developer, but to see if your stack is performant, you need tools that will confirm this. You may try to run workloads and benchmarks and see how your driver perform, but this will only give you a summarized score, consisting of many pieces. To further optimize this, you need to take extensive steps in understanding the applications, figuring out what is the bottleneck and optimizing it, is quite time-consuming process involving a lot of effort. This created a need for driver team to write a tool, that would make performance work on the driver easier, so we created compute benchmarks. In this suite we test all aspects of driver stack to see if they do not have any bottlenecks. Each test checks only one thing and does this in isolation, so it is very easy to work on optimizing it and doesn’t require any extensive setup. Benchmarks focus on such subtle aspect of every driver as: API overhead of every call, submission latencies, resource creation costs, transfer bandwidths, multi-threaded contention, multi process execution and many others. Framework offers capabilities for multiple backends, currently we have OpenCL and Level Zero implementations in place, so it is very easy to compare how the same scenario is services with different drivers. It is also very easy to compare driver implementations between vendors, as tests written in OpenCL simply work across different GPU implementations. We also use this code to present good and bad coding practices, this is very useful to showcase how simple things can drastically improve performance and users can simply run those scenarios and see how performance changes on their own setups. It is also a great tool to prototype new extensions and further propose them as a part of OpenCL standard. We plan to Open Source this project in Q2 2022, it is expected to be already available during IWOCL.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Writing efficient driver stack is the goal of every driver developer, but to see if your stack is performant, you need tools that will confirm this. You may try to run workloads and benchmarks and see how your driver perform, but this will only give you a summarized score, consisting of many pieces. To further optimize this, you need to take extensive steps in understanding the applications, figuring out what is the bottleneck and optimizing it, is quite time-consuming process involving a lot of effort. This created a need for driver team to write a tool, that would make performance work on the driver easier, so we created compute benchmarks. In this suite we test all aspects of driver stack to see if they do not have any bottlenecks. Each test checks only one thing and does this in isolation, so it is very easy to work on optimizing it and doesn’t require any extensive setup. Benchmarks focus on such subtle aspect of every driver as: API overhead of every call, submission latencies, resource creation costs, transfer bandwidths, multi-threaded contention, multi process execution and many others. Framework offers capabilities for multiple backends, currently we have OpenCL and Level Zero implementations in place, so it is very easy to compare how the same scenario is services with different drivers. It is also very easy to compare driver implementations between vendors, as tests written in OpenCL simply work across different GPU implementations. We also use this code to present good and bad coding practices, this is very useful to showcase how simple things can drastically improve performance and users can simply run those scenarios and see how performance changes on their own setups. It is also a great tool to prototype new extensions and further propose them as a part of OpenCL standard. We plan to Open Source this project in Q2 2022, it is expected to be already available during IWOCL.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
如何优化计算驱动程序?让我们从编写好的基准开始吧!
编写高效的驱动程序堆栈是每个驱动程序开发人员的目标,但是要查看堆栈是否具有性能,您需要能够确认这一点的工具。您可以尝试运行工作负载和基准测试,看看驱动程序的性能如何,但这只能给您一个汇总分数,由许多部分组成。为了进一步优化这一点,您需要采取广泛的步骤来理解应用程序,找出瓶颈并对其进行优化,这是一个非常耗时的过程,涉及大量的工作。这就需要驱动程序团队编写一个工具,这将使驱动程序的性能工作更容易,所以我们创建了计算基准。在这个套件中,我们测试了驱动程序堆栈的各个方面,看看它们是否没有任何瓶颈。每个测试只检查一件事,并且是独立完成的,因此很容易对其进行优化,不需要任何广泛的设置。基准测试关注每个驱动程序的微妙方面,如:每个调用的API开销、提交延迟、资源创建成本、传输带宽、多线程争用、多进程执行等等。框架为多个后端提供了功能,目前我们有OpenCL和Level Zero实现,所以很容易比较不同驱动程序下相同场景的服务。比较不同厂商的驱动实现也很容易,因为用OpenCL编写的测试可以跨不同的GPU实现工作。我们还使用这些代码来展示好的和坏的编码实践,这对展示简单的事情如何极大地提高性能非常有用,用户可以简单地运行这些场景,看看性能在他们自己的设置上是如何变化的。它也是一个很好的工具,可以为新的扩展创建原型,并进一步将它们作为OpenCL标准的一部分提出。我们计划在2022年第二季度开源这个项目,预计在IWOCL期间已经可用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL Acceleration of Quantum Transport Simulations with OpenCL CodePin: An Instrumentation-Based Debug Tool of SYCLomatic An Efficient Approach to Resolving Stack Overflow of SYCL Kernel on Intel® CPUs Ray Tracer based lidar simulation using SYCL
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1