基于机器学习的自动调优增强OpenCL性能可移植性

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI:10.1109/IPDPSW.2015.85

Thomas L. Falch, A. Elster

{"title":"基于机器学习的自动调优增强OpenCL性能可移植性","authors":"Thomas L. Falch, A. Elster","doi":"10.1109/IPDPSW.2015.85","DOIUrl":null,"url":null,"abstract":"Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"372 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability\",\"authors\":\"Thomas L. Falch, A. Elster\",\"doi\":\"10.1109/IPDPSW.2015.85\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.\",\"PeriodicalId\":340697,\"journal\":{\"name\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"volume\":\"372 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2015.85\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

异构计算将不同架构的设备结合在一起，它越来越受欢迎，并承诺在降低能耗的同时提高性能。OpenCL已被提议作为编写此类系统的标准，并提供功能可移植性。但是，它确实存在性能可移植性差的问题，为一个设备调优的代码必须重新调优才能在另一个设备上获得良好的性能。在本文中，我们使用基于机器学习的自动调谐来解决这个问题。在整个调优参数配置空间的随机子集上运行基准测试，结果用于构建基于人工神经网络的模型。然后，可以使用该模型找到参数空间中有趣的部分，以便进一步搜索。我们在几种设备上使用不同的基准测试来评估我们的方法，包括英特尔i7 3770 CPU，英伟达K40 GPU和AMD Radeon HD 7970 GPU。我们的模型实现了低至6.1%的平均相对误差，并且能够找到比全球最小值差1.3%的配置。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability

Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

自引率

0.00%

发文量

期刊最新文献

Accelerating Large-Scale Single-Source Shortest Path on FPGA Relocation-Aware Floorplanning for Partially-Reconfigurable FPGA-Based Systems iWAPT Introduction and Committees Computing the Pseudo-Inverse of a Graph's Laplacian Using GPUs Optimizing Defensive Investments in Energy-Based Cyber-Physical Systems