Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs

Muhsen Owaida, G. Falcão, J. Andrade, C. Antonopoulos, Nikolaos Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, A. Burg, P. Ienne
{"title":"Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs","authors":"Muhsen Owaida, G. Falcão, J. Andrade, C. Antonopoulos, Nikolaos Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, A. Burg, P. Ienne","doi":"10.1145/2656207","DOIUrl":null,"url":null,"abstract":"The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Embed. Comput. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2656207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过将CPU/GPU规格扩展到fpga来增强设计空间探索
复杂的专用计算系统的设计周期极其昂贵和耗时。它包括一个多参数的设计空间探索优化,其次是设计验证。特殊用途VLSI实现的设计人员通常需要通过耗时的蒙特卡罗模拟来探索参数,例如最佳位宽和数据表示。这种基于仿真的探索过程的一个突出例子是为纠错系统设计解码器,例如现代通信标准采用的低密度奇偶校验(LDPC)码,它涉及每个设计点的数千次蒙特卡罗运行。目前,高性能计算提供了一系列广泛的加速选项,从多核cpu到图形处理单元(gpu)和现场可编程门阵列(fpga)。利用不同的目标体系结构通常与开发多个代码版本相关联,通常使用不同的编程范例。在这种情况下,我们评估了将单个OpenCL程序重新定位到多个平台的概念,从而大大减少了设计时间。在多核cpu、gpu和fpga上无需修改或代码调优即可使用基于opencl的单个并行内核。我们使用SOpenCL(硅到OpenCL),一个自动将OpenCL内核转换为RTL的工具,以引入fpga作为一个潜在的平台,以有效地执行用OpenCL编码的模拟。我们使用LDPC解码模拟作为案例研究。实验结果通过测试多种规则和不规则的LDPC码得到,范围从短/中(如8000比特)到长(如64800比特)DVB-S2码。我们观察到,根据要模拟的设计参数,根据设计的尺寸和相位,GPU或FPGA可以更方便地适应不同的目的,从而提供比传统多核cpu不同的加速因子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hardware Acceleration for Embedded Keyword Spotting: Tutorial and Survey Adaptive Computation Reuse for Energy-Efficient Training of Deep Neural Networks Horizontal Auto-Scaling for Multi-Access Edge Computing Using Safe Reinforcement Learning IoT-Fog-Cloud Centric Earthquake Monitoring and Prediction Horizontal Side-Channel Vulnerabilities of Post-Quantum Key Exchange and Encapsulation Protocols
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1