SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers

Paul Scheffler, Luca Colagrande, Luca Benini
{"title":"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers","authors":"Paul Scheffler, Luca Colagrande, Luca Benini","doi":"arxiv-2404.05303","DOIUrl":null,"url":null,"abstract":"Stencil codes are performance-critical in many compute-intensive\napplications, but suffer from significant address calculation and irregular\nmemory access overheads. This work presents SARIS, a general and highly\nflexible methodology for stencil acceleration using register-mapped indirect\nstreams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V\ncompute cluster with indirect stream registers, achieving significant speedups\nof 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency\nimprovements of 1.58x over an RV32G baseline on average. Scaling out to a\n256-core manycore system, we estimate an average FPU utilization of 64%, an\naverage speedup of 2.14x, and up to 15% higher fractions of peak compute than a\nleading GPU code generator.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.05303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stencil codes are performance-critical in many compute-intensive applications, but suffer from significant address calculation and irregular memory access overheads. This work presents SARIS, a general and highly flexible methodology for stencil acceleration using register-mapped indirect streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V compute cluster with indirect stream registers, achieving significant speedups of 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency improvements of 1.58x over an RV32G baseline on average. Scaling out to a 256-core manycore system, we estimate an average FPU utilization of 64%, an average speedup of 2.14x, and up to 15% higher fractions of peak compute than a leading GPU code generator.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SARIS:利用间接流寄存器在高能效 RISC-V 计算集群上加速模版计算
在许多计算密集型应用中,模版代码对性能至关重要,但却存在大量地址计算和不规则内存访问开销。本研究提出了一种利用寄存器映射间接流进行模版加速的通用且高度灵活的方法--SARIS。我们在带有间接流寄存器的八核 RISC-V 计算集群上演示了各种模板代码的 SARIS,与 RV32G 基准相比,速度显著提高了 2.72 倍,FPU 利用率接近理想值的 81%,能效平均提高了 1.58 倍。扩展到 256 核多核系统,我们估计 FPU 平均利用率为 64%,平均速度提高了 2.14 倍,峰值计算分数比领先的 GPU 代码生成器高 15%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A prony method variant which surpasses the Adaptive LMS filter in the output signal's representation of input TorchDA: A Python package for performing data assimilation with deep learning forward and transformation functions HOBOTAN: Efficient Higher Order Binary Optimization Solver with Tensor Networks and PyTorch MPAT: Modular Petri Net Assembly Toolkit Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1