{"title":"基于机器学习的自动调优增强OpenCL性能可移植性","authors":"Thomas L. Falch, A. Elster","doi":"10.1109/IPDPSW.2015.85","DOIUrl":null,"url":null,"abstract":"Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"372 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability\",\"authors\":\"Thomas L. Falch, A. Elster\",\"doi\":\"10.1109/IPDPSW.2015.85\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.\",\"PeriodicalId\":340697,\"journal\":{\"name\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"volume\":\"372 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2015.85\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 39
摘要
异构计算将不同架构的设备结合在一起,它越来越受欢迎,并承诺在降低能耗的同时提高性能。OpenCL已被提议作为编写此类系统的标准,并提供功能可移植性。但是,它确实存在性能可移植性差的问题,为一个设备调优的代码必须重新调优才能在另一个设备上获得良好的性能。在本文中,我们使用基于机器学习的自动调谐来解决这个问题。在整个调优参数配置空间的随机子集上运行基准测试,结果用于构建基于人工神经网络的模型。然后,可以使用该模型找到参数空间中有趣的部分,以便进一步搜索。我们在几种设备上使用不同的基准测试来评估我们的方法,包括英特尔i7 3770 CPU,英伟达K40 GPU和AMD Radeon HD 7970 GPU。我们的模型实现了低至6.1%的平均相对误差,并且能够找到比全球最小值差1.3%的配置。
Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability
Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good performance on another device. In this paper, we use machine learning-based auto-tuning to address this problem. Benchmarks are run on a random subset of the entire tuning parameter configuration space, and the results are used to build an artificial neural network based model. The model can then be used to find interesting parts of the parameter space for further search. We evaluate our method with different benchmarks, on several devices, including an Intel i7 3770 CPU, an Nvidia K40 GPU and an AMD Radeon HD 7970 GPU. Our model achieves a mean relative error as low as 6.1%, and is able to find configurations as little as 1.3% worse than the global minimum.