Multidie 3-D Stacking of Memory Dominated Neuromorphic Architectures

IF 3.1 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-07-25 DOI:10.1109/TVLSI.2024.3421625

Leandro M. Giacomini Rocha;Refik Bilgic;Mohamed Naeim;Sudipta Das;Herman Oprins;Amirreza Yousefzadeh;Mario Konijnenburg;Dragomir Milojevic;James Myers;Julien Ryckaert;Dwaipayan Biswas

{"title":"Multidie 3-D Stacking of Memory Dominated Neuromorphic Architectures","authors":"Leandro M. Giacomini Rocha;Refik Bilgic;Mohamed Naeim;Sudipta Das;Herman Oprins;Amirreza Yousefzadeh;Mario Konijnenburg;Dragomir Milojevic;James Myers;Julien Ryckaert;Dwaipayan Biswas","doi":"10.1109/TVLSI.2024.3421625","DOIUrl":null,"url":null,"abstract":"Event-driven neuromorphic processors for artificial intelligence (AI) inference on edge/IoT devices require largeon-chip memory capacity, for efficient execution of spiking neural networks (NNs). In this work, we evaluate 3-D stacking benefits on SENECA, a digital neuromorphic accelerator core, sweeping itson-chip memory capacity from 2 up to 32 Mb in both legacy planar and advanced nanosheet CMOS logic nodes. In a planar CMOS node (GF-22 nm), two-die memory-on-logic (MoL) partitioning enables \n<inline-formula> <tex-math>$8\\times $ </tex-math></inline-formula>\n moreon-chip memory, and it boosts operating frequency by 7% with 26% less power than the 2-D. Moving to an advanced nanosheet technology (imec A10), multidie (up to 7 dies) MoL stacking enables a performance increase of up to 29% and power savings up to 31%. Furthermore, a core folding (CF) partitioning in A10 shows up to 16% performance improvement with 12% total power savings with respect to the 2-D implementation on the same technology. We also demonstrate no thermal overhead for multidie stacking at advanced nodes for designs exhibiting low power density. These physical design explorations lay the foundation for system technology co-optimization studies for edge devices.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2144-2148"},"PeriodicalIF":3.1000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10609345/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Event-driven neuromorphic processors for artificial intelligence (AI) inference on edge/IoT devices require largeon-chip memory capacity, for efficient execution of spiking neural networks (NNs). In this work, we evaluate 3-D stacking benefits on SENECA, a digital neuromorphic accelerator core, sweeping itson-chip memory capacity from 2 up to 32 Mb in both legacy planar and advanced nanosheet CMOS logic nodes. In a planar CMOS node (GF-22 nm), two-die memory-on-logic (MoL) partitioning enables

$8\times $

moreon-chip memory, and it boosts operating frequency by 7% with 26% less power than the 2-D. Moving to an advanced nanosheet technology (imec A10), multidie (up to 7 dies) MoL stacking enables a performance increase of up to 29% and power savings up to 31%. Furthermore, a core folding (CF) partitioning in A10 shows up to 16% performance improvement with 12% total power savings with respect to the 2-D implementation on the same technology. We also demonstrate no thermal overhead for multidie stacking at advanced nodes for designs exhibiting low power density. These physical design explorations lay the foundation for system technology co-optimization studies for edge devices.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

内存主导型神经形态架构的多层三维堆叠

边缘/物联网设备上用于人工智能（AI）推理的事件驱动神经形态处理器需要较大的片上存储器容量，以高效执行尖峰神经网络（NN）。在这项工作中，我们评估了数字神经形态加速器内核 SENECA 的三维堆叠优势，在传统的平面和先进的纳米片 CMOS 逻辑节点中，将其片上存储器容量从 2 Mb 提升到 32 Mb。在平面CMOS节点（GF-22 nm）上，双芯片逻辑存储器（MoL）分区实现了8倍的片上存储器容量，并将工作频率提高了7%，功耗比2-D低26%。采用先进的纳米片技术（imec A10），多芯片（最多 7 个芯片）MoL 堆叠可使性能提高 29%，功耗降低 31%。此外，A10 中的内核折叠（CF）分区与相同技术上的 2-D 实现相比，性能提高了 16%，总功耗降低了 12%。我们还证明，在先进节点上进行低功率密度设计时，多芯片堆叠不会产生热开销。这些物理设计探索为边缘器件的系统技术协同优化研究奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.