FDRA:一个支持多级并行的动态可重构加速器框架

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-10-10 DOI:10.1145/3614224

Yunhui Qiu, Yiqing Mao, Xuchen Gao, Sichao Chen, Jiangnan Li, Wenbo Yin, Lingli Wang

{"title":"FDRA:一个支持多级并行的动态可重构加速器框架","authors":"Yunhui Qiu, Yiqing Mao, Xuchen Gao, Sichao Chen, Jiangnan Li, Wenbo Yin, Lingli Wang","doi":"10.1145/3614224","DOIUrl":null,"url":null,"abstract":"Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open-source works often lack integration of CGRAs with CPU systems and corresponding toolchains. Moreover, there is rare support for the accelerator instruction pipelining to overlap data communication, computation, and configuration across multiple tasks. In this paper, we propose FDRA, an open-source exploration framework for a heterogeneous system-on-chip (SoC) with a RISC-V processor and a dynamically reconfigurable accelerator (DRA) supporting loop, instruction, and task levels of parallelism. FDRA encompasses parameterized SoC modeling, Verilog generation, source-to-source application code transformation using frontend and DRA compilers, SoC simulation, and FPGA prototyping. FDRA incorporates the extraction of periodic accumulative operators and multidimensional linear load/store operators from nested loops. The DRA enables accessing the shared L2 cache with virtual addresses and supports direct memory access (DMA) with arbitrary start addresses and data lengths. Integrated into the RISC-V Rocket SoC, our DRA achieves a remarkable 55 × acceleration for loop kernels and improves energy efficiency by 29 ×. Compared to state-of-the-art RISC-V vector units, our DRA demonstrates a 2.9 × speed improvement and 3.5 × greater energy efficiency. In contrast to previous CGRA+RISC-V SoCs, our SoC achieves a minimum speedup of 5.2 ×.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"30 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FDRA: A Framework for Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism\",\"authors\":\"Yunhui Qiu, Yiqing Mao, Xuchen Gao, Sichao Chen, Jiangnan Li, Wenbo Yin, Lingli Wang\",\"doi\":\"10.1145/3614224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open-source works often lack integration of CGRAs with CPU systems and corresponding toolchains. Moreover, there is rare support for the accelerator instruction pipelining to overlap data communication, computation, and configuration across multiple tasks. In this paper, we propose FDRA, an open-source exploration framework for a heterogeneous system-on-chip (SoC) with a RISC-V processor and a dynamically reconfigurable accelerator (DRA) supporting loop, instruction, and task levels of parallelism. FDRA encompasses parameterized SoC modeling, Verilog generation, source-to-source application code transformation using frontend and DRA compilers, SoC simulation, and FPGA prototyping. FDRA incorporates the extraction of periodic accumulative operators and multidimensional linear load/store operators from nested loops. The DRA enables accessing the shared L2 cache with virtual addresses and supports direct memory access (DMA) with arbitrary start addresses and data lengths. Integrated into the RISC-V Rocket SoC, our DRA achieves a remarkable 55 × acceleration for loop kernels and improves energy efficiency by 29 ×. Compared to state-of-the-art RISC-V vector units, our DRA demonstrates a 2.9 × speed improvement and 3.5 × greater energy efficiency. In contrast to previous CGRA+RISC-V SoCs, our SoC achieves a minimum speedup of 5.2 ×.\",\"PeriodicalId\":49248,\"journal\":{\"name\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3614224\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3614224","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

粗粒度可重构架构(CGRAs)由于其高灵活性和高能效而成为一种很有前途的加速器。然而，现有的开源作品往往缺乏CGRAs与CPU系统和相应的工具链的集成。此外，很少支持加速器指令流水线来跨多个任务重叠数据通信、计算和配置。在本文中，我们提出了FDRA，这是一个异构片上系统(SoC)的开源探索框架，具有RISC-V处理器和动态可重构加速器(DRA)，支持循环，指令和任务并行级别。FDRA包括参数化SoC建模、Verilog生成、使用前端和DRA编译器的源到源应用程序代码转换、SoC仿真和FPGA原型。FDRA结合了从嵌套循环中提取周期性累积操作符和多维线性加载/存储操作符。DRA支持使用虚拟地址访问共享L2缓存，并支持使用任意起始地址和数据长度的直接内存访问(DMA)。集成到RISC-V Rocket SoC中，我们的DRA实现了环路内核的55倍加速，并将能源效率提高了29倍。与先进的RISC-V矢量单元相比，我们的DRA速度提高了2.9倍，能效提高了3.5倍。与之前的CGRA+RISC-V SoC相比，我们的SoC实现了5.2倍的最小加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FDRA: A Framework for Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism

Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open-source works often lack integration of CGRAs with CPU systems and corresponding toolchains. Moreover, there is rare support for the accelerator instruction pipelining to overlap data communication, computation, and configuration across multiple tasks. In this paper, we propose FDRA, an open-source exploration framework for a heterogeneous system-on-chip (SoC) with a RISC-V processor and a dynamically reconfigurable accelerator (DRA) supporting loop, instruction, and task levels of parallelism. FDRA encompasses parameterized SoC modeling, Verilog generation, source-to-source application code transformation using frontend and DRA compilers, SoC simulation, and FPGA prototyping. FDRA incorporates the extraction of periodic accumulative operators and multidimensional linear load/store operators from nested loops. The DRA enables accessing the shared L2 cache with virtual addresses and supports direct memory access (DMA) with arbitrary start addresses and data lengths. Integrated into the RISC-V Rocket SoC, our DRA achieves a remarkable 55 × acceleration for loop kernels and improves energy efficiency by 29 ×. Compared to state-of-the-art RISC-V vector units, our DRA demonstrates a 2.9 × speed improvement and 3.5 × greater energy efficiency. In contrast to previous CGRA+RISC-V SoCs, our SoC achieves a minimum speedup of 5.2 ×.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Reconfigurable Technology and Systems COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.90

自引率

8.70%

发文量

审稿时长

>12 weeks

期刊介绍： TRETS is the top journal focusing on research in, on, and with reconfigurable systems and on their underlying technology. The scope, rationale, and coverage by other journals are often limited to particular aspects of reconfigurable technology or reconfigurable systems. TRETS is a journal that covers reconfigurability in its own right. Topics that would be appropriate for TRETS would include all levels of reconfigurable system abstractions and all aspects of reconfigurable technology including platforms, programming environments and application successes that support these systems for computing or other applications. -The board and systems architectures of a reconfigurable platform. -Programming environments of reconfigurable systems, especially those designed for use with reconfigurable systems that will lead to increased programmer productivity. -Languages and compilers for reconfigurable systems. -Logic synthesis and related tools, as they relate to reconfigurable systems. -Applications on which success can be demonstrated. The underlying technology from which reconfigurable systems are developed. (Currently this technology is that of FPGAs, but research on the nature and use of follow-on technologies is appropriate for TRETS.) In considering whether a paper is suitable for TRETS, the foremost question should be whether reconfigurability has been essential to success. Topics such as architecture, programming languages, compilers, and environments, logic synthesis, and high performance applications are all suitable if the context is appropriate. For example, an architecture for an embedded application that happens to use FPGAs is not necessarily suitable for TRETS, but an architecture using FPGAs for which the reconfigurability of the FPGAs is an inherent part of the specifications (perhaps due to a need for re-use on multiple applications) would be appropriate for TRETS.