利用 gem5-SALAMv2 扩展硬件加速器系统设计空间探索

IF 3.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2024-06-27 DOI:10.1016/j.sysarc.2024.103211
Zephaniah Spencer, Samuel Rogers, Joshua Slycord, Hamed Tabkhi
{"title":"利用 gem5-SALAMv2 扩展硬件加速器系统设计空间探索","authors":"Zephaniah Spencer,&nbsp;Samuel Rogers,&nbsp;Joshua Slycord,&nbsp;Hamed Tabkhi","doi":"10.1016/j.sysarc.2024.103211","DOIUrl":null,"url":null,"abstract":"<div><p>With the prevalence of hardware accelerators as an integral part of the modern systems on chip (SoCs), the ability to model accelerators quickly and accurately within the system in which it operates is critical. This paper presents gem5-SALAMv2 as a novel system architecture for LLVM-based modeling and simulation of custom hardware accelerators integrated into the gem5 framework. It overcomes the inherent limitations of state-of-the-art trace-based pre-register-transfer level (RTL) simulators by offering a truly “execute-in-execute” LLVM-based model. It enables scalable modeling of multiple dynamically interacting accelerators with full-system simulation support. To create long-term sustainable expansion compatible with the gem5 system framework, gem5-SALAM offers a general-purpose and modular communication interface and memory hierarchy integrated into the gem5 ecosystem, streamlining designing and modeling accelerators for new and emerging applications. gem5-SALAMv2 expands upon the framework established in gem5-SALAMv1 with improved LLVM-based elaboration and simulation, improved and more extensible system integration, and new automations to simplify rapid prototyping and design space exploration. <span><sup>1</sup></span></p><p>Validation on the MachSuite (Reagen et al., 2014) benchmarks presents a timing estimation error of less than 1% against the Vivado High-Level Synthesis (HLS) tool. Results also show less than a 4% area and power estimation error against Synopsys Design Compiler. Additionally, system validation against implementations on an Ultrascale+ ZCU102 shows an average end-to-end timing error of less than 2%. Lastly, we demonstrate the upgraded capabilities of gem5-SALAMv2 by exploring accelerator platforms for two deep neural networks, LeNet5 and MobileNetv2. In these explorations, we demonstrate how gem5-SALAMv2 can simulate such systems and guide architectural optimizations for these types of accelerator-rich architectures. <span><sup>2</sup></span></p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103211"},"PeriodicalIF":3.7000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Expanding hardware accelerator system design space exploration with gem5-SALAMv2\",\"authors\":\"Zephaniah Spencer,&nbsp;Samuel Rogers,&nbsp;Joshua Slycord,&nbsp;Hamed Tabkhi\",\"doi\":\"10.1016/j.sysarc.2024.103211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>With the prevalence of hardware accelerators as an integral part of the modern systems on chip (SoCs), the ability to model accelerators quickly and accurately within the system in which it operates is critical. This paper presents gem5-SALAMv2 as a novel system architecture for LLVM-based modeling and simulation of custom hardware accelerators integrated into the gem5 framework. It overcomes the inherent limitations of state-of-the-art trace-based pre-register-transfer level (RTL) simulators by offering a truly “execute-in-execute” LLVM-based model. It enables scalable modeling of multiple dynamically interacting accelerators with full-system simulation support. To create long-term sustainable expansion compatible with the gem5 system framework, gem5-SALAM offers a general-purpose and modular communication interface and memory hierarchy integrated into the gem5 ecosystem, streamlining designing and modeling accelerators for new and emerging applications. gem5-SALAMv2 expands upon the framework established in gem5-SALAMv1 with improved LLVM-based elaboration and simulation, improved and more extensible system integration, and new automations to simplify rapid prototyping and design space exploration. <span><sup>1</sup></span></p><p>Validation on the MachSuite (Reagen et al., 2014) benchmarks presents a timing estimation error of less than 1% against the Vivado High-Level Synthesis (HLS) tool. Results also show less than a 4% area and power estimation error against Synopsys Design Compiler. Additionally, system validation against implementations on an Ultrascale+ ZCU102 shows an average end-to-end timing error of less than 2%. Lastly, we demonstrate the upgraded capabilities of gem5-SALAMv2 by exploring accelerator platforms for two deep neural networks, LeNet5 and MobileNetv2. In these explorations, we demonstrate how gem5-SALAMv2 can simulate such systems and guide architectural optimizations for these types of accelerator-rich architectures. <span><sup>2</sup></span></p></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"154 \",\"pages\":\"Article 103211\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762124001486\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001486","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

随着硬件加速器成为现代片上系统(SoC)不可分割的一部分,在其运行的系统中快速、准确地对加速器进行建模的能力至关重要。本文介绍的 gem5-SALAMv2 是一种新型系统架构,用于对集成到 gem5 框架中的定制硬件加速器进行基于 LLVM 的建模和仿真。它通过提供真正基于 LLVM 的 "执行中执行"(execute-in-execute)模型,克服了最先进的基于跟踪的寄存器传输层前(RTL)仿真器的固有局限性。它可对多个动态交互加速器进行可扩展建模,并支持全系统仿真。gem5-SALAMv2在gem5-SALAMv1建立的框架基础上进行了扩展,改进了基于LLVM的阐述和仿真,提高了系统集成的可扩展性,并提供了新的自动化功能来简化快速原型开发和设计空间探索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Expanding hardware accelerator system design space exploration with gem5-SALAMv2

With the prevalence of hardware accelerators as an integral part of the modern systems on chip (SoCs), the ability to model accelerators quickly and accurately within the system in which it operates is critical. This paper presents gem5-SALAMv2 as a novel system architecture for LLVM-based modeling and simulation of custom hardware accelerators integrated into the gem5 framework. It overcomes the inherent limitations of state-of-the-art trace-based pre-register-transfer level (RTL) simulators by offering a truly “execute-in-execute” LLVM-based model. It enables scalable modeling of multiple dynamically interacting accelerators with full-system simulation support. To create long-term sustainable expansion compatible with the gem5 system framework, gem5-SALAM offers a general-purpose and modular communication interface and memory hierarchy integrated into the gem5 ecosystem, streamlining designing and modeling accelerators for new and emerging applications. gem5-SALAMv2 expands upon the framework established in gem5-SALAMv1 with improved LLVM-based elaboration and simulation, improved and more extensible system integration, and new automations to simplify rapid prototyping and design space exploration. 1

Validation on the MachSuite (Reagen et al., 2014) benchmarks presents a timing estimation error of less than 1% against the Vivado High-Level Synthesis (HLS) tool. Results also show less than a 4% area and power estimation error against Synopsys Design Compiler. Additionally, system validation against implementations on an Ultrascale+ ZCU102 shows an average end-to-end timing error of less than 2%. Lastly, we demonstrate the upgraded capabilities of gem5-SALAMv2 by exploring accelerator platforms for two deep neural networks, LeNet5 and MobileNetv2. In these explorations, we demonstrate how gem5-SALAMv2 can simulate such systems and guide architectural optimizations for these types of accelerator-rich architectures. 2

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
期刊最新文献
Non-interactive set intersection for privacy-preserving contact tracing NLTSP: A cost model for tensor program tuning using nested loop trees SAMFL: Secure Aggregation Mechanism for Federated Learning with Byzantine-robustness by functional encryption ZNS-Cleaner: Enhancing lifespan by reducing empty erase in ZNS SSDs Using MAST for modeling and response-time analysis of real-time applications with GPUs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1