内存主导型神经形态架构的多层三维堆叠

IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-07-25 DOI:10.1109/TVLSI.2024.3421625
Leandro M. Giacomini Rocha;Refik Bilgic;Mohamed Naeim;Sudipta Das;Herman Oprins;Amirreza Yousefzadeh;Mario Konijnenburg;Dragomir Milojevic;James Myers;Julien Ryckaert;Dwaipayan Biswas
{"title":"内存主导型神经形态架构的多层三维堆叠","authors":"Leandro M. Giacomini Rocha;Refik Bilgic;Mohamed Naeim;Sudipta Das;Herman Oprins;Amirreza Yousefzadeh;Mario Konijnenburg;Dragomir Milojevic;James Myers;Julien Ryckaert;Dwaipayan Biswas","doi":"10.1109/TVLSI.2024.3421625","DOIUrl":null,"url":null,"abstract":"Event-driven neuromorphic processors for artificial intelligence (AI) inference on edge/IoT devices require largeon-chip memory capacity, for efficient execution of spiking neural networks (NNs). In this work, we evaluate 3-D stacking benefits on SENECA, a digital neuromorphic accelerator core, sweeping itson-chip memory capacity from 2 up to 32 Mb in both legacy planar and advanced nanosheet CMOS logic nodes. In a planar CMOS node (GF-22 nm), two-die memory-on-logic (MoL) partitioning enables \n<inline-formula> <tex-math>$8\\times $ </tex-math></inline-formula>\n moreon-chip memory, and it boosts operating frequency by 7% with 26% less power than the 2-D. Moving to an advanced nanosheet technology (imec A10), multidie (up to 7 dies) MoL stacking enables a performance increase of up to 29% and power savings up to 31%. Furthermore, a core folding (CF) partitioning in A10 shows up to 16% performance improvement with 12% total power savings with respect to the 2-D implementation on the same technology. We also demonstrate no thermal overhead for multidie stacking at advanced nodes for designs exhibiting low power density. These physical design explorations lay the foundation for system technology co-optimization studies for edge devices.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2144-2148"},"PeriodicalIF":2.8000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multidie 3-D Stacking of Memory Dominated Neuromorphic Architectures\",\"authors\":\"Leandro M. Giacomini Rocha;Refik Bilgic;Mohamed Naeim;Sudipta Das;Herman Oprins;Amirreza Yousefzadeh;Mario Konijnenburg;Dragomir Milojevic;James Myers;Julien Ryckaert;Dwaipayan Biswas\",\"doi\":\"10.1109/TVLSI.2024.3421625\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Event-driven neuromorphic processors for artificial intelligence (AI) inference on edge/IoT devices require largeon-chip memory capacity, for efficient execution of spiking neural networks (NNs). In this work, we evaluate 3-D stacking benefits on SENECA, a digital neuromorphic accelerator core, sweeping itson-chip memory capacity from 2 up to 32 Mb in both legacy planar and advanced nanosheet CMOS logic nodes. In a planar CMOS node (GF-22 nm), two-die memory-on-logic (MoL) partitioning enables \\n<inline-formula> <tex-math>$8\\\\times $ </tex-math></inline-formula>\\n moreon-chip memory, and it boosts operating frequency by 7% with 26% less power than the 2-D. Moving to an advanced nanosheet technology (imec A10), multidie (up to 7 dies) MoL stacking enables a performance increase of up to 29% and power savings up to 31%. Furthermore, a core folding (CF) partitioning in A10 shows up to 16% performance improvement with 12% total power savings with respect to the 2-D implementation on the same technology. We also demonstrate no thermal overhead for multidie stacking at advanced nodes for designs exhibiting low power density. These physical design explorations lay the foundation for system technology co-optimization studies for edge devices.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"32 11\",\"pages\":\"2144-2148\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10609345/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10609345/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

边缘/物联网设备上用于人工智能(AI)推理的事件驱动神经形态处理器需要较大的片上存储器容量,以高效执行尖峰神经网络(NN)。在这项工作中,我们评估了数字神经形态加速器内核 SENECA 的三维堆叠优势,在传统的平面和先进的纳米片 CMOS 逻辑节点中,将其片上存储器容量从 2 Mb 提升到 32 Mb。在平面CMOS节点(GF-22 nm)上,双芯片逻辑存储器(MoL)分区实现了8倍的片上存储器容量,并将工作频率提高了7%,功耗比2-D低26%。采用先进的纳米片技术(imec A10),多芯片(最多 7 个芯片)MoL 堆叠可使性能提高 29%,功耗降低 31%。此外,A10 中的内核折叠(CF)分区与相同技术上的 2-D 实现相比,性能提高了 16%,总功耗降低了 12%。我们还证明,在先进节点上进行低功率密度设计时,多芯片堆叠不会产生热开销。这些物理设计探索为边缘器件的系统技术协同优化研究奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multidie 3-D Stacking of Memory Dominated Neuromorphic Architectures
Event-driven neuromorphic processors for artificial intelligence (AI) inference on edge/IoT devices require largeon-chip memory capacity, for efficient execution of spiking neural networks (NNs). In this work, we evaluate 3-D stacking benefits on SENECA, a digital neuromorphic accelerator core, sweeping itson-chip memory capacity from 2 up to 32 Mb in both legacy planar and advanced nanosheet CMOS logic nodes. In a planar CMOS node (GF-22 nm), two-die memory-on-logic (MoL) partitioning enables $8\times $ moreon-chip memory, and it boosts operating frequency by 7% with 26% less power than the 2-D. Moving to an advanced nanosheet technology (imec A10), multidie (up to 7 dies) MoL stacking enables a performance increase of up to 29% and power savings up to 31%. Furthermore, a core folding (CF) partitioning in A10 shows up to 16% performance improvement with 12% total power savings with respect to the 2-D implementation on the same technology. We also demonstrate no thermal overhead for multidie stacking at advanced nodes for designs exhibiting low power density. These physical design explorations lay the foundation for system technology co-optimization studies for edge devices.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.40
自引率
7.10%
发文量
187
审稿时长
3.6 months
期刊介绍: The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.
期刊最新文献
Table of Contents IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information Table of Contents IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1