利用 gem5-SALAMv2 扩展硬件加速器系统设计空间探索

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2024-06-27 DOI:10.1016/j.sysarc.2024.103211

Zephaniah Spencer, Samuel Rogers, Joshua Slycord, Hamed Tabkhi

{"title":"利用 gem5-SALAMv2 扩展硬件加速器系统设计空间探索","authors":"Zephaniah Spencer, Samuel Rogers, Joshua Slycord, Hamed Tabkhi","doi":"10.1016/j.sysarc.2024.103211","DOIUrl":null,"url":null,"abstract":"<div>With the prevalence of hardware accelerators as an integral part of the modern systems on chip (SoCs), the ability to model accelerators quickly and accurately within the system in which it operates is critical. This paper presents gem5-SALAMv2 as a novel system architecture for LLVM-based modeling and simulation of custom hardware accelerators integrated into the gem5 framework. It overcomes the inherent limitations of state-of-the-art trace-based pre-register-transfer level (RTL) simulators by offering a truly “execute-in-execute” LLVM-based model. It enables scalable modeling of multiple dynamically interacting accelerators with full-system simulation support. To create long-term sustainable expansion compatible with the gem5 system framework, gem5-SALAM offers a general-purpose and modular communication interface and memory hierarchy integrated into the gem5 ecosystem, streamlining designing and modeling accelerators for new and emerging applications. gem5-SALAMv2 expands upon the framework established in gem5-SALAMv1 with improved LLVM-based elaboration and simulation, improved and more extensible system integration, and new automations to simplify rapid prototyping and design space exploration. 1Validation on the MachSuite (Reagen et al., 2014) benchmarks presents a timing estimation error of less than 1% against the Vivado High-Level Synthesis (HLS) tool. Results also show less than a 4% area and power estimation error against Synopsys Design Compiler. Additionally, system validation against implementations on an Ultrascale+ ZCU102 shows an average end-to-end timing error of less than 2%. Lastly, we demonstrate the upgraded capabilities of gem5-SALAMv2 by exploring accelerator platforms for two deep neural networks, LeNet5 and MobileNetv2. In these explorations, we demonstrate how gem5-SALAMv2 can simulate such systems and guide architectural optimizations for these types of accelerator-rich architectures. 2</div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103211"},"PeriodicalIF":3.7000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Expanding hardware accelerator system design space exploration with gem5-SALAMv2\",\"authors\":\"Zephaniah Spencer, Samuel Rogers, Joshua Slycord, Hamed Tabkhi\",\"doi\":\"10.1016/j.sysarc.2024.103211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>With the prevalence of hardware accelerators as an integral part of the modern systems on chip (SoCs), the ability to model accelerators quickly and accurately within the system in which it operates is critical. This paper presents gem5-SALAMv2 as a novel system architecture for LLVM-based modeling and simulation of custom hardware accelerators integrated into the gem5 framework. It overcomes the inherent limitations of state-of-the-art trace-based pre-register-transfer level (RTL) simulators by offering a truly “execute-in-execute” LLVM-based model. It enables scalable modeling of multiple dynamically interacting accelerators with full-system simulation support. To create long-term sustainable expansion compatible with the gem5 system framework, gem5-SALAM offers a general-purpose and modular communication interface and memory hierarchy integrated into the gem5 ecosystem, streamlining designing and modeling accelerators for new and emerging applications. gem5-SALAMv2 expands upon the framework established in gem5-SALAMv1 with improved LLVM-based elaboration and simulation, improved and more extensible system integration, and new automations to simplify rapid prototyping and design space exploration. 1Validation on the MachSuite (Reagen et al., 2014) benchmarks presents a timing estimation error of less than 1% against the Vivado High-Level Synthesis (HLS) tool. Results also show less than a 4% area and power estimation error against Synopsys Design Compiler. Additionally, system validation against implementations on an Ultrascale+ ZCU102 shows an average end-to-end timing error of less than 2%. Lastly, we demonstrate the upgraded capabilities of gem5-SALAMv2 by exploring accelerator platforms for two deep neural networks, LeNet5 and MobileNetv2. In these explorations, we demonstrate how gem5-SALAMv2 can simulate such systems and guide architectural optimizations for these types of accelerator-rich architectures. 2</div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"154 \",\"pages\":\"Article 103211\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762124001486\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001486","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

随着硬件加速器成为现代片上系统（SoC）不可分割的一部分，在其运行的系统中快速、准确地对加速器进行建模的能力至关重要。本文介绍的 gem5-SALAMv2 是一种新型系统架构，用于对集成到 gem5 框架中的定制硬件加速器进行基于 LLVM 的建模和仿真。它通过提供真正基于 LLVM 的 "执行中执行"（execute-in-execute）模型，克服了最先进的基于跟踪的寄存器传输层前（RTL）仿真器的固有局限性。它可对多个动态交互加速器进行可扩展建模，并支持全系统仿真。gem5-SALAMv2在gem5-SALAMv1建立的框架基础上进行了扩展，改进了基于LLVM的阐述和仿真，提高了系统集成的可扩展性，并提供了新的自动化功能来简化快速原型开发和设计空间探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Expanding hardware accelerator system design space exploration with gem5-SALAMv2

With the prevalence of hardware accelerators as an integral part of the modern systems on chip (SoCs), the ability to model accelerators quickly and accurately within the system in which it operates is critical. This paper presents gem5-SALAMv2 as a novel system architecture for LLVM-based modeling and simulation of custom hardware accelerators integrated into the gem5 framework. It overcomes the inherent limitations of state-of-the-art trace-based pre-register-transfer level (RTL) simulators by offering a truly “execute-in-execute” LLVM-based model. It enables scalable modeling of multiple dynamically interacting accelerators with full-system simulation support. To create long-term sustainable expansion compatible with the gem5 system framework, gem5-SALAM offers a general-purpose and modular communication interface and memory hierarchy integrated into the gem5 ecosystem, streamlining designing and modeling accelerators for new and emerging applications. gem5-SALAMv2 expands upon the framework established in gem5-SALAMv1 with improved LLVM-based elaboration and simulation, improved and more extensible system integration, and new automations to simplify rapid prototyping and design space exploration. ¹

Validation on the MachSuite (Reagen et al., 2014) benchmarks presents a timing estimation error of less than 1% against the Vivado High-Level Synthesis (HLS) tool. Results also show less than a 4% area and power estimation error against Synopsys Design Compiler. Additionally, system validation against implementations on an Ultrascale+ ZCU102 shows an average end-to-end timing error of less than 2%. Lastly, we demonstrate the upgraded capabilities of gem5-SALAMv2 by exploring accelerator platforms for two deep neural networks, LeNet5 and MobileNetv2. In these explorations, we demonstrate how gem5-SALAMv2 can simulate such systems and guide architectural optimizations for these types of accelerator-rich architectures. ²

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.