T. Lloyd, Artem Chikin, Sanket Kedia, D. Jain, J. N. Amaral
{"title":"Automated GPU Grid Geometry Selection for OPENMP Kernels","authors":"T. Lloyd, Artem Chikin, Sanket Kedia, D. Jain, J. N. Amaral","doi":"10.1109/CAHPC.2018.8645848","DOIUrl":null,"url":null,"abstract":"Modern supercomputers are increasingly using GPUs to improve performance per watt. Generating GPU code for target regions in openMP 4.0, or later versions, requires the selection of grid geometry to execute the GPU kernel. Existing industrial-strength compilers use a simple heuristic with arbitrary numbers that are constant for all kernels. After characterizing the relationship between region features, grid geometry and performance, we built a machine-learning model that successfully predicts a suitable geometry for such kernels and results in a performance improvement with a geometric mean of 5% across the benchmarks studied. However, this prediction is impractical because the overhead of the predictor is too high. A careful study of the results of the predictor allowed for the development of a practical low-overhead heuristic that resulted in a performance improvement of up to 7 times with a geometric mean of 25.9%. This paper describes the methodology to build the machine-learning model, and the practical low-overhead heuristic that can be used in industry-strong compilers.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"234 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAHPC.2018.8645848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Modern supercomputers are increasingly using GPUs to improve performance per watt. Generating GPU code for target regions in openMP 4.0, or later versions, requires the selection of grid geometry to execute the GPU kernel. Existing industrial-strength compilers use a simple heuristic with arbitrary numbers that are constant for all kernels. After characterizing the relationship between region features, grid geometry and performance, we built a machine-learning model that successfully predicts a suitable geometry for such kernels and results in a performance improvement with a geometric mean of 5% across the benchmarks studied. However, this prediction is impractical because the overhead of the predictor is too high. A careful study of the results of the predictor allowed for the development of a practical low-overhead heuristic that resulted in a performance improvement of up to 7 times with a geometric mean of 25.9%. This paper describes the methodology to build the machine-learning model, and the practical low-overhead heuristic that can be used in industry-strong compilers.