Chengshuo Yu, Taegeun Yoo, T. T. Kim, K. Chai, Bongjin Kim
{"title":"A 16K Current-Based 8T SRAM Compute-In-Memory Macro with Decoupled Read/Write and 1-5bit Column ADC","authors":"Chengshuo Yu, Taegeun Yoo, T. T. Kim, K. Chai, Bongjin Kim","doi":"10.1109/CICC48029.2020.9075883","DOIUrl":null,"url":null,"abstract":"A novel 8T SRAM -based bitcell is proposed for current-based compute-in-memory dot-product operations. The proposed bitcell with two extra NMOS transistors (vs. standard 6T SRAM) decouples SRAM read and write operation. A 128×128 8T SRAM bitcell array is built for processing a vector-matrix multiplication (or parallel dot-products) with 64x binary (0 or 1) inputs, 64×128 binary (-1 or +1) weights, and 128x 1-5bit outputs. Each column (i.e. neuron) of the proposed SRAM compute-in-memory macro consists of 64x bitcells for dot-product, 32x bitcells for ADC, and 32x bitcells for calibration. The column-based neuron minimizes the ADC overhead by reusing a sense amplifier for SRAM read. The column-wise ADC converts the analog dot-product results to N-bit output codes (N=1 to 5) by sweeping reference levels using replica bitcells for 2N-1 cycles for each conversion. Monte-Carlo simulations and test-chip measurement results have verified both linearity and process variation. The largest variation (σ=2.48%) results in the MNIST classification accuracy of 96.2% (i.e. 0.4% lower than a baseline with no variation). A test-chip is fabricated using 65nm, and the 16K SRAM bitcell array occupies 0.055mm2. The energy efficiency of the 1bit operation is 490-to-15.8TOPS/W at 1-5bit ADC mode using 0.45/0.8V core supply and 200MHz.","PeriodicalId":409525,"journal":{"name":"2020 IEEE Custom Integrated Circuits Conference (CICC)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC48029.2020.9075883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41
Abstract
A novel 8T SRAM -based bitcell is proposed for current-based compute-in-memory dot-product operations. The proposed bitcell with two extra NMOS transistors (vs. standard 6T SRAM) decouples SRAM read and write operation. A 128×128 8T SRAM bitcell array is built for processing a vector-matrix multiplication (or parallel dot-products) with 64x binary (0 or 1) inputs, 64×128 binary (-1 or +1) weights, and 128x 1-5bit outputs. Each column (i.e. neuron) of the proposed SRAM compute-in-memory macro consists of 64x bitcells for dot-product, 32x bitcells for ADC, and 32x bitcells for calibration. The column-based neuron minimizes the ADC overhead by reusing a sense amplifier for SRAM read. The column-wise ADC converts the analog dot-product results to N-bit output codes (N=1 to 5) by sweeping reference levels using replica bitcells for 2N-1 cycles for each conversion. Monte-Carlo simulations and test-chip measurement results have verified both linearity and process variation. The largest variation (σ=2.48%) results in the MNIST classification accuracy of 96.2% (i.e. 0.4% lower than a baseline with no variation). A test-chip is fabricated using 65nm, and the 16K SRAM bitcell array occupies 0.055mm2. The energy efficiency of the 1bit operation is 490-to-15.8TOPS/W at 1-5bit ADC mode using 0.45/0.8V core supply and 200MHz.