Register Caching for Stencil Computations on GPUs

Thomas L. Falch, A. Elster
{"title":"Register Caching for Stencil Computations on GPUs","authors":"Thomas L. Falch, A. Elster","doi":"10.1109/SYNASC.2014.70","DOIUrl":null,"url":null,"abstract":"For most applications, taking full advantage of the memory system is key to achieving good performance on GPUs. In this paper, we introduce register caching, a novel idea where registers of multiple threads are combined and used as a shared, last level, manually managed cache for the contributing threads. This method is enabled by the shuffle instruction recently introduced in Nvidia's Kepler GPU architecture, which allows threads in the same warp to exchange data directly, previously only possible by going through shared memory. We evaluate our proposal with a stencil computation benchmark, achieving speedups of up to 2.04, compared to using shared memory on a GTX680 GPU. Stencil computations form the core of many scientific applications, which can therefore benefit from our proposal. Furthermore, our method is not limited to stencil computations, but is applicable to any application with a predictable memory access pattern suitable for manual caching.","PeriodicalId":150575,"journal":{"name":"2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2014.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

For most applications, taking full advantage of the memory system is key to achieving good performance on GPUs. In this paper, we introduce register caching, a novel idea where registers of multiple threads are combined and used as a shared, last level, manually managed cache for the contributing threads. This method is enabled by the shuffle instruction recently introduced in Nvidia's Kepler GPU architecture, which allows threads in the same warp to exchange data directly, previously only possible by going through shared memory. We evaluate our proposal with a stencil computation benchmark, achieving speedups of up to 2.04, compared to using shared memory on a GTX680 GPU. Stencil computations form the core of many scientific applications, which can therefore benefit from our proposal. Furthermore, our method is not limited to stencil computations, but is applicable to any application with a predictable memory access pattern suitable for manual caching.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在gpu上进行模板计算的寄存器缓存
对于大多数应用程序,充分利用内存系统是在gpu上实现良好性能的关键。在本文中,我们介绍了寄存器缓存,这是一种新颖的思想,将多个线程的寄存器组合在一起,作为共享的、最后一级的、为贡献线程手动管理的缓存。这种方法是由Nvidia的Kepler GPU架构中最近引入的shuffle指令启用的,它允许相同warp中的线程直接交换数据,而以前只能通过共享内存。我们用一个模板计算基准来评估我们的提议,与在GTX680 GPU上使用共享内存相比,实现了高达2.04的加速。模板计算构成了许多科学应用的核心,因此可以从我们的建议中受益。此外,我们的方法不仅限于模板计算,而且适用于任何具有适合手动缓存的可预测内存访问模式的应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluating Weighted Round Robin Load Balancing for Cloud Web Services Lipschitz Bounds for Noise Robustness in Compressive Sensing: Two Algorithms Open and Interoperable Socio-technical Networks Computing Homological Information Based on Directed Graphs within Discrete Objects Automated Synthesis of Target-Dependent Programs for Polynomial Evaluation in Fixed-Point Arithmetic
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1