gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Computer Architecture Letters Pub Date : 2023-11-01 DOI:10.1109/LCA.2023.3329443

João Vieira;Nuno Roma;Gabriel Falcao;Pedro Tomás

引用次数: 0

Abstract

Attaining the performance and efficiency levels required by modern applications often requires the use of application-specific accelerators. However, writing synthesizable Register-Transfer Level code for such accelerators is a complex, expensive, and time-consuming process, which is cumbersome for early architecture development phases. To tackle this issue, a pre-synthesis simulation toolchain is herein proposed that facilitates the early architectural evaluation of complex accelerators aggregated to multi-level memory hierarchies. To demonstrate its usefulness, the proposed gem5-accel is used to model a tensor accelerator based on Gemmini, showing that it can successfully anticipate the results of complex hardware accelerators executing deep Neural Networks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

gem5-accel：用于加速器架构验证的预 RTL 仿真工具链

要达到现代应用所需的性能和效率水平，往往需要使用特定应用加速器。然而，为这类加速器编写可综合的寄存器传输级代码是一个复杂、昂贵且耗时的过程，对于早期架构开发阶段来说非常麻烦。为解决这一问题，本文提出了一种合成前仿真工具链，有助于对聚合到多级存储器层次结构的复杂加速器进行早期架构评估。为了证明该工具的实用性，我们使用所提出的 gem5-accel 对基于 Gemmini 的张量加速器进行建模，结果表明它能成功预测执行深度神经网络的复杂硬件加速器的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.

期刊最新文献

DAWN: Efficient Distribution of Attention Workload in PIM-Enabled Systems for LLM Inference 2025 Reviewers List* Driving the Core Frontend With LiteBTB CTL: A Case for CXL Device-Managed Hugepages H3: Hybrid Architecture Using High Bandwidth Memory and High Bandwidth Flash for Cost-Efficient LLM Inference