HLS Implementation of a Building Cube Stencil Computation Framework for an FPGA Accelerator

Daiki Furukawa, Taito Manabe, Yuichiro Shibata, Tomohiro Ueno, Kentaro Sano
{"title":"HLS Implementation of a Building Cube Stencil Computation Framework for an FPGA Accelerator","authors":"Daiki Furukawa, Taito Manabe, Yuichiro Shibata, Tomohiro Ueno, Kentaro Sano","doi":"10.1109/ICCE59016.2024.10444277","DOIUrl":null,"url":null,"abstract":"FPGAs are promising energy-efficient accelerators for computing-intensive applications such as electromagnetic field simulations, which are also important tasks for consumer product design. Especially, stencil computation, which is a commonly-used computing pattern for scientific and engineering simulations, is known to have a high degree of affinity with FPGAs. In practical simulations, data reduction methods, such as the building cube method (BCM), are often utilized to balance computation accuracy and speed. However, such techniques tend to introduce irregular memory access patterns, making it a tough task for application programmers to implement efficient memory access hardware units in FPGAs. In this paper, we propose a design framework for stencil computation with BCM, enabling application programmers to focus on algorithm implementation without being aware of memory access optimization. We implement the framework on an Intel FPGA PAC D5005 platform, to evaluate its effectiveness in terms of resource utilization, execution time, and throughput. As for resource utilization, it was confirmed that the area overhead of the proposed BCM framework is small enough, leaving sufficient resource space for user applications. The performance evaluation results revealed that the measured throughput of the BCM framework deteriorated by more than 90% compared to non-BCM execution due to irregular memory access patterns. However, since the number of cells to be computed in BCM is significantly reduced, the final computation speed was improved by up to 28 times, indicating that the reduction in the throughput is acceptable.","PeriodicalId":518694,"journal":{"name":"2024 IEEE International Conference on Consumer Electronics (ICCE)","volume":"3 4","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE59016.2024.10444277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

FPGAs are promising energy-efficient accelerators for computing-intensive applications such as electromagnetic field simulations, which are also important tasks for consumer product design. Especially, stencil computation, which is a commonly-used computing pattern for scientific and engineering simulations, is known to have a high degree of affinity with FPGAs. In practical simulations, data reduction methods, such as the building cube method (BCM), are often utilized to balance computation accuracy and speed. However, such techniques tend to introduce irregular memory access patterns, making it a tough task for application programmers to implement efficient memory access hardware units in FPGAs. In this paper, we propose a design framework for stencil computation with BCM, enabling application programmers to focus on algorithm implementation without being aware of memory access optimization. We implement the framework on an Intel FPGA PAC D5005 platform, to evaluate its effectiveness in terms of resource utilization, execution time, and throughput. As for resource utilization, it was confirmed that the area overhead of the proposed BCM framework is small enough, leaving sufficient resource space for user applications. The performance evaluation results revealed that the measured throughput of the BCM framework deteriorated by more than 90% compared to non-BCM execution due to irregular memory access patterns. However, since the number of cells to be computed in BCM is significantly reduced, the final computation speed was improved by up to 28 times, indicating that the reduction in the throughput is acceptable.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为 FPGA 加速器实现 Building Cube Stencil 计算框架的 HLS 实现
对于电磁场仿真等计算密集型应用来说,FPGA 是一种前景广阔的节能加速器,而电磁场仿真也是消费类产品设计的重要任务。尤其是模板计算,它是科学和工程仿真中常用的计算模式,与 FPGA 有很高的亲和力。在实际仿真中,为了兼顾计算精度和速度,通常会使用数据缩减方法,如建筑立方体法(BCM)。然而,这类技术往往会引入不规则的内存访问模式,这使得应用程序员在 FPGA 中实现高效内存访问硬件单元成为一项艰巨的任务。在本文中,我们提出了利用 BCM 进行模版计算的设计框架,使应用程序员能够专注于算法实施,而无需考虑内存访问优化。我们在英特尔 FPGA PAC D5005 平台上实现了该框架,并从资源利用率、执行时间和吞吐量等方面评估了其有效性。在资源利用率方面,证实了所提出的 BCM 框架的面积开销足够小,为用户应用留出了足够的资源空间。性能评估结果表明,由于内存访问模式不规则,BCM 框架的实测吞吐量与非 BCCM 执行相比下降了 90% 以上。不过,由于 BCM 中需要计算的单元数量大幅减少,最终计算速度提高了 28 倍,这表明吞吐量的降低是可以接受的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HLS Implementation of a Building Cube Stencil Computation Framework for an FPGA Accelerator Performance Enhancement using Data Augmentation of Depth Estimation for Autonomous Driving Robotic Prosthesis with Controllable Knee Angle that Responds to Changes in Gait Pattern A Multi-Functional Drone for Agriculture Maintenance and Monitoring in Small-Scale Farming Enhancing Scene Understanding in VR for Visually Impaired Individuals with High-Frame Videos and Event Overlays
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1