Loop coarsening in C-based High-Level Synthesis

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI:10.1109/ASAP.2015.7245730

Moritz Schmid, Oliver Reiche, Frank Hannig, J. Teich

{"title":"Loop coarsening in C-based High-Level Synthesis","authors":"Moritz Schmid, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.1109/ASAP.2015.7245730","DOIUrl":null,"url":null,"abstract":"Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP), the support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines, consisting of point and local operators. In addition to well known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows to process multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework HIPAcc by loop coarsening and compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPUs), all generated from the exact same code base. Moreover, we demonstrate the advantages of code generation for algorithm development by outlining how design space exploration enabled by HIPAcc can yield a more efficient implementation than hand-coded VHDL.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"41 1","pages":"166-173"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2015.7245730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP), the support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines, consisting of point and local operators. In addition to well known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows to process multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework HIPAcc by loop coarsening and compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPUs), all generated from the exact same code base. Moreover, we demonstrate the advantages of code generation for algorithm development by outlining how design space exploration enabled by HIPAcc can yield a more efficient implementation than hand-coded VHDL.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

c基高阶合成中的环粗化

当前用于高级综合(HLS)的工具擅长利用指令级并行性(ILP)，相比之下，对现场可编程门阵列(fpga)的关键优势之一数据级并行性(DLP)的支持非常有限。这项工作研究了DLP在fpga上的利用，使用基于c的HLS图像过滤器和流管道的代码生成，由点和局部算子组成。除了众所周知的循环平铺技术，我们提出循环粗化，它提供了卓越的性能和可扩展性。循环平铺对应于将图像分割成单独的区域，然后由复制的加速器并行处理。对于数据流，这也需要生成粘合逻辑，用于图像数据的分布。相反，循环粗化允许并行处理多个像素，因此在单个加速器中只复制内核操作符。我们通过循环粗化来增强异构领域特定语言(DSL)框架hipac的FPGA后端，并将结果FPGA加速器与图形处理单元(gpu)的高度优化软件实现进行比较，所有这些都是从完全相同的代码库生成的。此外，我们通过概述由hipac支持的设计空间探索如何产生比手工编码的VHDL更有效的实现，展示了代码生成用于算法开发的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

自引率

0.00%

发文量

期刊最新文献

Message from the Conference Chairs - ASAP 2020 Message from the ASAP 2016 chairs An IEEE 754 double-precision floating-point multiplier for denormalized and normalized floating-point numbers Application-set driven exploration for custom processor architectures Stochastic circuit design and performance evaluation of vector quantization