A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing

2021 IEEE International Solid- State Circuits Conference (ISSCC) Pub Date : 2021-02-13 DOI:10.1109/ISSCC42613.2021.9365788

Hongyang Jia, Murat Ozatay, Yinqi Tang, Hossein Valavi, Rakshit Pathak, Jinseok Lee, N. Verma

{"title":"A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing","authors":"Hongyang Jia, Murat Ozatay, Yinqi Tang, Hossein Valavi, Rakshit Pathak, Jinseok Lee, N. Verma","doi":"10.1109/ISSCC42613.2021.9365788","DOIUrl":null,"url":null,"abstract":"This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC42613.2021.9365788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 80

Abstract

This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于可扩展内存计算的可编程神经网络推理加速器

本文提出了一种可扩展的16nm神经网络推理加速器，该加速器基于一系列可编程内核，采用混合信号内存计算(IMC)、数字近内存计算(NMC)和局部缓冲/控制。IMC实现了主导神经网络的矩阵向量乘法(mvm)的高能量效率和吞吐量;但是，可扩展性带来了许多挑战，无论是在技术上，走向先进的节点以保持对数字架构的优势，还是在架构上，为了全面执行各种神经网络。最近的演示已经探索了将IMC集成到可编程处理器中[1,2]，但尚未实现完整执行的IMC效率和吞吐量。与数字引擎相比，IMC面临的主要挑战是物理设计点和相关权衡的差异。也就是说，IMC大大提高了计算能源效率和硬件密度/并行性，但保留了硬件虚拟化的开销(跨空间/时间计算映射的状态和数据交换/缓冲/通信)。演示的体系结构是与sw映射算法(封装在自定义图形编译器中)共同设计的，以便在广泛的映射策略中提供效率，以克服这些开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE International Solid- State Circuits Conference (ISSCC)

自引率

0.00%

发文量