Fast Behavioural RTL Simulation of 10B Transistor SoC Designs with Metro-Mpi

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI:10.23919/DATE56975.2023.10137080

Guillem López-Paradís, Brian Li, Adrià Armejach, Stefan Wallentowitz, Miquel Moretó, Jonathan Balkind

{"title":"Fast Behavioural RTL Simulation of 10B Transistor SoC Designs with Metro-Mpi","authors":"Guillem López-Paradís, Brian Li, Adrià Armejach, Stefan Wallentowitz, Miquel Moretó, Jonathan Balkind","doi":"10.23919/DATE56975.2023.10137080","DOIUrl":null,"url":null,"abstract":"Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for the designer's benefit. However, tools largely remain restricted to a single machine and in the case of RTL simulation, we believe that this leaves much potential performance on the table. We introduce Metro-MPI to improve RTL simulation for modern 10 billion transistor-scale chips. Metro-MPI exploits the natural boundaries present in chip designs to partition RTL simulations and leverage High Performance Computing (HPC) techniques to extract parallelism. For chip designs that scale in size by exploiting latency-insensitive interfaces like networks-on-chip and AXI, Metro-MPI offers a new paradigm for RTL simulation scalability. Our implementation of Metro-MPI in Open-Piton+Ariane delivers 2.7 MIPS of RTL simulation throughput for the first time on a design with more than 10 billion transistors and 1,024 Linux-capable cores, opening new avenues for distributed RTL simulation of emerging system-on-chip designs. Compared to sequential and multithreaded RTL simulations of smaller designs, Metro-MPI achieves up to $135.98\\times$ and $9.29\\times$ speedups. Similarly, for a representative regression run, Metro-Mpireduces energy consumption by up to $2.53\\times$ and $2.91\\times$.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE56975.2023.10137080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for the designer's benefit. However, tools largely remain restricted to a single machine and in the case of RTL simulation, we believe that this leaves much potential performance on the table. We introduce Metro-MPI to improve RTL simulation for modern 10 billion transistor-scale chips. Metro-MPI exploits the natural boundaries present in chip designs to partition RTL simulations and leverage High Performance Computing (HPC) techniques to extract parallelism. For chip designs that scale in size by exploiting latency-insensitive interfaces like networks-on-chip and AXI, Metro-MPI offers a new paradigm for RTL simulation scalability. Our implementation of Metro-MPI in Open-Piton+Ariane delivers 2.7 MIPS of RTL simulation throughput for the first time on a design with more than 10 billion transistors and 1,024 Linux-capable cores, opening new avenues for distributed RTL simulation of emerging system-on-chip designs. Compared to sequential and multithreaded RTL simulations of smaller designs, Metro-MPI achieves up to $135.98\times$ and $9.29\times$ speedups. Similarly, for a representative regression run, Metro-Mpireduces energy consumption by up to $2.53\times$ and $2.91\times$.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于Metro-Mpi的10B晶体管SoC设计的快速行为RTL仿真

拥有数百亿晶体管的芯片已成为当今的标准。这些设计在整个设计过程中使我们的电子设计自动化工具变得紧张，需要更多的计算资源。在许多工具中，并行化改善了延迟和吞吐量，从而使设计人员受益。然而，工具在很大程度上仍然限制在一台机器上，在RTL模拟的情况下，我们认为这留下了很多潜在的性能。我们引入了Metro-MPI来改进现代100亿晶体管规模芯片的RTL模拟。Metro-MPI利用芯片设计中的自然边界来划分RTL模拟，并利用高性能计算(HPC)技术来提取并行性。对于通过利用对延迟不敏感的接口(如片上网络和AXI)来扩展尺寸的芯片设计，Metro-MPI为RTL仿真可扩展性提供了一种新的范例。我们在Open-Piton+Ariane中实现的Metro-MPI首次在超过100亿个晶体管和1024个linux内核的设计上提供了2.7 MIPS的RTL模拟吞吐量，为新兴的片上系统设计的分布式RTL模拟开辟了新的途径。与较小设计的顺序和多线程RTL模拟相比，Metro-MPI实现了高达135.98倍和9.29倍的速度提升。同样，对于代表性的回归运行，metro - mpi减少能耗高达2.53美元和2.91美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量

期刊最新文献

Securing a RISC-V architecture: A dynamic approach Perspector: Benchmarking Benchmark Suites Fast Behavioural RTL Simulation of 10B Transistor SoC Designs with Metro-Mpi Lightspeed Binary Neural Networks using Optical Phase-Change Materials Time Series-based Driving Event Recognition for Two Wheelers