gpu上有限寄存器模板的面向图形的代码转换方法

Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang
{"title":"gpu上有限寄存器模板的面向图形的代码转换方法","authors":"Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang","doi":"10.1109/CCGrid.2016.13","DOIUrl":null,"url":null,"abstract":"Stencil kernels play an important role in many scientific and engineering disciplines. With the development of numerical algorithms and the increasing requirements of accuracy, register-limited stencils containing massive variables and operations are widely used. However, these register-limited stencils consume vast resources when executing on GPUs. The excessive use of registers reduces the number of active threads dramatically, and consequently leads to a serious performance decline. To improve the performance of these register-limited stencils, we propose a DDG (data-dependency-graph) oriented code transformation approach in this paper. By analyzing, deleting and transforming the original stencil program on GPUs, our graph-oriented code transformation approach explores for the best trade-off between the calculation amount and the parallelism degree, and further achieves better performance. The graph-oriented code transformation approach is evaluated using the Weighted Nearly Analytic Discrete stencil, and the experimental result shows that a speedup of 2.16X can be achieved when compared with the original fairly-optimized implementation. To the best of our knowledge, our study takes the first step towards balancing the calculation amount and parallelism degree of the extremely register-limited stencils on GPUs.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs\",\"authors\":\"Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang\",\"doi\":\"10.1109/CCGrid.2016.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stencil kernels play an important role in many scientific and engineering disciplines. With the development of numerical algorithms and the increasing requirements of accuracy, register-limited stencils containing massive variables and operations are widely used. However, these register-limited stencils consume vast resources when executing on GPUs. The excessive use of registers reduces the number of active threads dramatically, and consequently leads to a serious performance decline. To improve the performance of these register-limited stencils, we propose a DDG (data-dependency-graph) oriented code transformation approach in this paper. By analyzing, deleting and transforming the original stencil program on GPUs, our graph-oriented code transformation approach explores for the best trade-off between the calculation amount and the parallelism degree, and further achieves better performance. The graph-oriented code transformation approach is evaluated using the Weighted Nearly Analytic Discrete stencil, and the experimental result shows that a speedup of 2.16X can be achieved when compared with the original fairly-optimized implementation. To the best of our knowledge, our study takes the first step towards balancing the calculation amount and parallelism degree of the extremely register-limited stencils on GPUs.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

模板核在许多科学和工程学科中起着重要的作用。随着数值算法的发展和精度要求的提高,包含大量变量和运算的限寄存器模板得到了广泛的应用。然而,这些寄存器有限的模板在gpu上执行时会消耗大量的资源。过度使用寄存器会大大减少活动线程的数量,从而导致严重的性能下降。为了提高这些寄存器受限模板的性能,本文提出了一种面向DDG(数据依赖图)的代码转换方法。本文提出的面向图形的代码转换方法通过分析、删除和转换gpu上的原始模板程序,探索了计算量和并行度之间的最佳权衡,从而获得更好的性能。使用加权近解析离散模板对面向图形的代码转换方法进行了评估,实验结果表明,与原始的优化实现相比,面向图形的代码转换方法的速度提高了2.16倍。据我们所知,我们的研究在平衡gpu上极度寄存器限制的模板的计算量和并行度方面迈出了第一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs
Stencil kernels play an important role in many scientific and engineering disciplines. With the development of numerical algorithms and the increasing requirements of accuracy, register-limited stencils containing massive variables and operations are widely used. However, these register-limited stencils consume vast resources when executing on GPUs. The excessive use of registers reduces the number of active threads dramatically, and consequently leads to a serious performance decline. To improve the performance of these register-limited stencils, we propose a DDG (data-dependency-graph) oriented code transformation approach in this paper. By analyzing, deleting and transforming the original stencil program on GPUs, our graph-oriented code transformation approach explores for the best trade-off between the calculation amount and the parallelism degree, and further achieves better performance. The graph-oriented code transformation approach is evaluated using the Weighted Nearly Analytic Discrete stencil, and the experimental result shows that a speedup of 2.16X can be achieved when compared with the original fairly-optimized implementation. To the best of our knowledge, our study takes the first step towards balancing the calculation amount and parallelism degree of the extremely register-limited stencils on GPUs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm DiBA: Distributed Power Budget Allocation for Large-Scale Computing Clusters Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era DTStorage: Dynamic Tape-Based Storage for Cost-Effective and Highly-Available Streaming Service Facilitating the Execution of HPC Workloads in Colombia through the Integration of a Private IaaS and a Scientific PaaS/SaaS Marketplace
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1