A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation

Hyunjoon Kim, Qian Chen, Taegeun Yoo, T. T. Kim, Bongjin Kim
{"title":"A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation","authors":"Hyunjoon Kim, Qian Chen, Taegeun Yoo, T. T. Kim, Bongjin Kim","doi":"10.1109/ESSCIRC.2019.8902824","DOIUrl":null,"url":null,"abstract":"This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.","PeriodicalId":402948,"journal":{"name":"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC.2019.8902824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有列mac结构和位串行计算的1-16b精度可重构数字内存计算宏
这项工作提出了一个具有1-16b可重构权重和输入位精度的数字内存计算宏,用于节能深度神经网络处理。所提出的数字宏包括128×128位单元,每个位单元由三个用于内存计算的构建块、一个基于xnor的位乘法器、一个全加法器和一个SRAM单元组成。然后将二维位单元数组划分为并行神经元,每个神经元具有排成一行的128×柱状乘法和累加(column-MAC)单元。每个具有N位可变权重精度的列mac都在列中使用“N+7”位元(即1- 16位的8- 23位元)构建。n位权重存储在SRAM单元中,用于内存计算,获取权重的内存访问最少。剩余的7位单元格用于扩展msb,以便通过128列mac累积部分和。将位串行输入广播到同一列中的所有位单元格,并执行并行的按位相乘操作。每个列mac的按位相乘结果然后使用N+7个全加法器累积,这些加法器垂直连接以作为纹波进位加法器。同时,输入精度由LSB到MSB的位串行输入周期数决定。因此,需要多比特输入精度的后累加。制作了65nm测试芯片,测得1-16bit时的能量效率为117.3 ~ 2.06TOPS/W。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A 78 fs RMS Jitter Injection-Locked Clock Multiplier Using Transformer-Based Ultra-Low-Power VCO An Integrated Programmable High-Voltage Bipolar Pulser With Embedded Transmit/Receive Switch for Miniature Ultrasound Probes Machine Learning Based Prior-Knowledge-Free Calibration for Split Pipelined-SAR ADCs with Open-Loop Amplifiers Achieving 93.7-dB SFDR An 18 dBm 155-180 GHz SiGe Power Amplifier Using a 4-Way T-Junction Combining Network A Bidirectional Brain Computer Interface with 64-Channel Recording, Resonant Stimulation and Artifact Suppression in Standard 65nm CMOS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1