使用多面体模型优化GPU加速器的两级并行化

Proceedings of the 26th International Conference on Compiler Construction Pub Date : 2017-02-05 DOI:10.1145/3033019.3033022

J. Shirako, Akihiro Hayashi, Vivek Sarkar

{"title":"使用多面体模型优化GPU加速器的两级并行化","authors":"J. Shirako, Akihiro Hayashi, Vivek Sarkar","doi":"10.1145/3033019.3033022","DOIUrl":null,"url":null,"abstract":"While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Optimized two-level parallelization for GPU accelerators using the polyhedral model\",\"authors\":\"J. Shirako, Akihiro Hayashi, Vivek Sarkar\",\"doi\":\"10.1145/3033019.3033022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.\",\"PeriodicalId\":146080,\"journal\":{\"name\":\"Proceedings of the 26th International Conference on Compiler Construction\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th International Conference on Compiler Construction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3033019.3033022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Conference on Compiler Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033019.3033022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

虽然GPU在当今的高性能计算机中扮演着越来越重要的角色，但优化GPU性能仍然给程序员带来了巨大的负担。优化gpu代码的主要挑战来自硬件并行性的两个层面，块和线程;每个级别都有显著不同的特征，需要不同的优化策略。本文提出了一种新的GPU并行编译优化算法。我们的方法基于多面体模型，与传统的基于ast的框架相比，它在程序分析和转换方面取得了重大进展。我们扩展了多面体调度，通过叠加的思想来实现两级并行，它集成了块级和线程级并行的单独调度。我们的实验结果表明，与最先进的gpgpu多面体并行代码生成器(PPCG)相比，我们提出的编译器优化框架可以在NVIDIA Tesla M2050和K80 gpu上提供1.8倍和2.1倍的几何平均改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimized two-level parallelization for GPU accelerators using the polyhedral model

While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 26th International Conference on Compiler Construction

自引率

0.00%

发文量

期刊最新文献

Let it recover: multiparty protocol-induced recovery Static optimization in PHP 7 Compile-time function memoization Optimized two-level parallelization for GPU accelerators using the polyhedral model Lightweight data race detection for production runs