FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL

Kamalavasan Kamalakkannan, G. Mudalige, I. Reguly, Suhaib A. Fahmy
{"title":"FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL","authors":"Kamalavasan Kamalakkannan, G. Mudalige, I. Reguly, Suhaib A. Fahmy","doi":"10.1145/3529538.3530007","DOIUrl":null,"url":null,"abstract":"We explore the design and development of structured-mesh-based solvers on Intel FPGA hardware using the SYCL programming model. Two classes of applications are targeted : (1) stencil applications based on explicit numerical methods and (2) multi-dimensional tridiagonal solvers based on implicit methods. Both classes of solvers appear as core modules in a variety of real-world applications ranging from computational fluid dynamics to financial computing. A general, unified workflow is formulated for synthesizing these applications on Intel FPGAs together with predictive analytic models to explore the design space to obtain optimized performance. Performance of synthesized designs, using the above techniques, for two non-trivial applications on an Intel PAC D5005 FPGA card is benchmarked. Results are compared to the performance of optimized parallel implementations of the same applications on a Nvidia V100 GPU. Observed runtime results indicate the FPGA providing comparable or improved performance to the V100 GPU. However, more importantly the FPGA solutions consume 59%–76% less energy for their largest configurations. Our performance model predicts the runtime of designs with high accuracy with less than 5% error for all cases tested, demonstrating significant utility for design space exploration. With these tools and techniques, we discuss determinants for a given structured-mesh code to be amenable to FPGA implementation, providing insights into the feasibility and profitability FPGA implementation, how to code designs using SYCL, and the resulting performance.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3530007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We explore the design and development of structured-mesh-based solvers on Intel FPGA hardware using the SYCL programming model. Two classes of applications are targeted : (1) stencil applications based on explicit numerical methods and (2) multi-dimensional tridiagonal solvers based on implicit methods. Both classes of solvers appear as core modules in a variety of real-world applications ranging from computational fluid dynamics to financial computing. A general, unified workflow is formulated for synthesizing these applications on Intel FPGAs together with predictive analytic models to explore the design space to obtain optimized performance. Performance of synthesized designs, using the above techniques, for two non-trivial applications on an Intel PAC D5005 FPGA card is benchmarked. Results are compared to the performance of optimized parallel implementations of the same applications on a Nvidia V100 GPU. Observed runtime results indicate the FPGA providing comparable or improved performance to the V100 GPU. However, more importantly the FPGA solutions consume 59%–76% less energy for their largest configurations. Our performance model predicts the runtime of designs with high accuracy with less than 5% error for all cases tested, demonstrating significant utility for design space exploration. With these tools and techniques, we discuss determinants for a given structured-mesh code to be amenable to FPGA implementation, providing insights into the feasibility and profitability FPGA implementation, how to code designs using SYCL, and the resulting performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于SYCL的结构网格显式和隐式数值求解的FPGA加速
我们探索使用SYCL编程模型在英特尔FPGA硬件上设计和开发基于结构化网格的求解器。针对两类应用:(1)基于显式数值方法的模板应用和(2)基于隐式方法的多维三对角线求解。这两类求解器在从计算流体动力学到金融计算的各种实际应用中都作为核心模块出现。为在英特尔fpga上综合这些应用程序制定了一个通用的、统一的工作流程,并结合预测分析模型来探索设计空间以获得最佳性能。在英特尔PAC D5005 FPGA卡上,使用上述技术对两个重要应用程序的综合设计性能进行了基准测试。结果与Nvidia V100 GPU上相同应用程序的优化并行实现的性能进行了比较。观察到的运行时结果表明,FPGA提供了与V100 GPU相当或更高的性能。然而,更重要的是,FPGA解决方案在其最大配置中消耗的能量减少了59%-76%。我们的性能模型以高精度预测设计的运行时,在所有测试情况下误差小于5%,证明了设计空间探索的重要实用性。通过这些工具和技术,我们讨论了给定结构化网格代码适用于FPGA实现的决定因素,提供了对FPGA实现的可行性和盈利能力的见解,如何使用SYCL编码设计,以及由此产生的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL Acceleration of Quantum Transport Simulations with OpenCL CodePin: An Instrumentation-Based Debug Tool of SYCLomatic An Efficient Approach to Resolving Stack Overflow of SYCL Kernel on Intel® CPUs Ray Tracer based lidar simulation using SYCL
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1