A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation

ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC) Pub Date : 2019-09-01 DOI:10.1109/ESSCIRC.2019.8902824

Hyunjoon Kim, Qian Chen, Taegeun Yoo, T. T. Kim, Bongjin Kim

{"title":"A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation","authors":"Hyunjoon Kim, Qian Chen, Taegeun Yoo, T. T. Kim, Bongjin Kim","doi":"10.1109/ESSCIRC.2019.8902824","DOIUrl":null,"url":null,"abstract":"This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.","PeriodicalId":402948,"journal":{"name":"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC.2019.8902824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

Abstract

This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有列mac结构和位串行计算的1-16b精度可重构数字内存计算宏

这项工作提出了一个具有1-16b可重构权重和输入位精度的数字内存计算宏，用于节能深度神经网络处理。所提出的数字宏包括128×128位单元，每个位单元由三个用于内存计算的构建块、一个基于xnor的位乘法器、一个全加法器和一个SRAM单元组成。然后将二维位单元数组划分为并行神经元，每个神经元具有排成一行的128×柱状乘法和累加(column-MAC)单元。每个具有N位可变权重精度的列mac都在列中使用“N+7”位元(即1- 16位的8- 23位元)构建。n位权重存储在SRAM单元中，用于内存计算，获取权重的内存访问最少。剩余的7位单元格用于扩展msb，以便通过128列mac累积部分和。将位串行输入广播到同一列中的所有位单元格，并执行并行的按位相乘操作。每个列mac的按位相乘结果然后使用N+7个全加法器累积，这些加法器垂直连接以作为纹波进位加法器。同时，输入精度由LSB到MSB的位串行输入周期数决定。因此，需要多比特输入精度的后累加。制作了65nm测试芯片，测得1-16bit时的能量效率为117.3 ~ 2.06TOPS/W。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)

自引率

0.00%

发文量