Hyunjoon Kim, Qian Chen, Taegeun Yoo, T. T. Kim, Bongjin Kim
{"title":"具有列mac结构和位串行计算的1-16b精度可重构数字内存计算宏","authors":"Hyunjoon Kim, Qian Chen, Taegeun Yoo, T. T. Kim, Bongjin Kim","doi":"10.1109/ESSCIRC.2019.8902824","DOIUrl":null,"url":null,"abstract":"This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.","PeriodicalId":402948,"journal":{"name":"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation\",\"authors\":\"Hyunjoon Kim, Qian Chen, Taegeun Yoo, T. T. Kim, Bongjin Kim\",\"doi\":\"10.1109/ESSCIRC.2019.8902824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.\",\"PeriodicalId\":402948,\"journal\":{\"name\":\"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESSCIRC.2019.8902824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC.2019.8902824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation
This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.