Exploiting Computation Reuse for Stencil Accelerators.

Yuze Chi, Jason Cong
{"title":"Exploiting Computation Reuse for Stencil Accelerators.","authors":"Yuze Chi,&nbsp;Jason Cong","doi":"10.1109/dac18072.2020.9218680","DOIUrl":null,"url":null,"abstract":"<p><p>Stencil kernel is an important type of kernel used extensively in many application domains. Over the years, researchers have been studying the optimizations on parallelization, communication reuse, and computation reuse for various target platforms. However, challenges still exist, especially on the computation reuse problem for accelerators, due to the lack of complete design-space exploration and effective design-space pruning. In this paper, we present solutions to the above challenges for a wide range of stencil kernels (i.e., stencil with reduction operations), where the computation reuse patterns are extremely flexible due to the commutative and associative properties. We formally define the complete design space, based on which we present a provably optimal dynamic programming algorithm and a heuristic beam search algorithm that provides near-optimal solutions under an architecture-aware model. Experimental results show that for synthesizing stencil kernels to FPGAs, compared with state-of-the-art stencil compiler without computation reuse capability, our proposed algorithm can reduce the look-up table (LUT) and digital signal processor (DSP) usage by 58.1% and 54.6% on average respectively, which leads to an average speedup of 2.3× for compute-intensive kernels, outperforming the latest CPU/GPU results.</p>","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"2020 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/dac18072.2020.9218680","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dac18072.2020.9218680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/10/9 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Stencil kernel is an important type of kernel used extensively in many application domains. Over the years, researchers have been studying the optimizations on parallelization, communication reuse, and computation reuse for various target platforms. However, challenges still exist, especially on the computation reuse problem for accelerators, due to the lack of complete design-space exploration and effective design-space pruning. In this paper, we present solutions to the above challenges for a wide range of stencil kernels (i.e., stencil with reduction operations), where the computation reuse patterns are extremely flexible due to the commutative and associative properties. We formally define the complete design space, based on which we present a provably optimal dynamic programming algorithm and a heuristic beam search algorithm that provides near-optimal solutions under an architecture-aware model. Experimental results show that for synthesizing stencil kernels to FPGAs, compared with state-of-the-art stencil compiler without computation reuse capability, our proposed algorithm can reduce the look-up table (LUT) and digital signal processor (DSP) usage by 58.1% and 54.6% on average respectively, which leads to an average speedup of 2.3× for compute-intensive kernels, outperforming the latest CPU/GPU results.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开发模板加速器的计算重用。
模板内核是一种重要的内核类型,广泛应用于许多应用领域。多年来,研究人员一直在研究各种目标平台的并行化、通信重用和计算重用的优化。然而,由于缺乏完整的设计空间探索和有效的设计空间修剪,仍然存在挑战,特别是加速器的计算重用问题。在本文中,我们针对广泛的模板内核(即带有约简操作的模板)提出了解决上述挑战的方案,其中计算重用模式由于交换性和关联性而非常灵活。我们正式定义了完整的设计空间,在此基础上,我们提出了一个可证明的最优动态规划算法和一个启发式光束搜索算法,该算法在架构感知模型下提供了接近最优的解决方案。实验结果表明,在将模板内核合成为fpga时,与目前最先进的没有计算重用能力的模板编译器相比,本文提出的算法可将查找表(LUT)和数字信号处理器(DSP)的使用分别减少58.1%和54.6%,对计算密集型内核的平均加速提高2.3倍,优于最新的CPU/GPU结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Muffin: A Framework Toward Multi-Dimension AI Fairness by Uniting Off-the-Shelf Models. DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10 - 14, 2022 General Chair's Message Exploiting Computation Reuse for Stencil Accelerators. Reconciling remote attestation and safety-critical operation on simple IoT devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1