基于6T-SRAM的内存中计算架构,可重构SAR adc,用于边缘机器学习应用中的节能深度神经网络

Avishek Biswas, Hetul Sanghvi, M. Mehendale, G. Preet
{"title":"基于6T-SRAM的内存中计算架构,可重构SAR adc,用于边缘机器学习应用中的节能深度神经网络","authors":"Avishek Biswas, Hetul Sanghvi, M. Mehendale, G. Preet","doi":"10.1109/CICC53496.2022.9772789","DOIUrl":null,"url":null,"abstract":"Compute-In-Memory (CIM) is a promising approach to enable low power Machine Learning (ML) based applications on edge devices, since it significantly reduces data movement by embedding computations inside or near the memory, unlike traditional all-digital implementations. Conventional 6-transistor (6T) SRAM bit-cell based CIM approaches [1]–[3] suffer from bit-cell disturb issue due to accessing multiple cells in a column, limiting the dynamic voltage range allowed for analog dot-product (DP) computations. They are also highly prone to bit-cell discharge current (Icell) variation, degrading the overall accuracy of the neural network (NN) inference. Alternate approaches e.g. [4] requires a custom-designed 10T bitcell which consumes 2-3x larger cell area. To address these challenges, we present an area-efficient CIM approach (CIM-D6T) which uses compact 6T foundry bit-cells while achieving robustness to bit-cell Vt variations and eliminates any read disturb issues, improving the dynamic voltage range for DP. This is achieved by decoupling the 6T cell read from the analog DP computation. As shown in Fig. 1, a pair of extra metal capacitors (Cm) connected to the lines XAp, XAn are added over the SRAM column to store and process the analog voltages for the DP's. The 6T cells in a row are read locally and the read data values are used in the local LRW+MAVa circuit to discharge the analog voltage on the XAp/XAn capacitor to ground. These extra capacitors do not consume additional silicon area since they are implemented as metal comb capacitors over the existing SRAM array using higher metal layers. Fig. 1 shows the overall architecture of the proposed CIM half-array with 256x64 6T bit-cells, split into 16 sub-arrays each with 16 rows and 64 columns. Weights for different 3D filters in a given NN layer (output channel dimension) are mapped to a different sub-array. A group of 2 local columns with 16 rows in each form 1 mux-ed local column (LCOLmx) and hence, each sub-array has 32 parallel ports for input feature map (IFMP) values and the weights. Each LCOLmx along the vertical dimension share a single DAC, which converts a 6-b unsigned digital input (XIN[5:0]) to an analog voltage (0 to Vref). The same analog voltage (Va) is shared across all sub-arrays along a column.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An area-efficient 6T-SRAM based Compute-In-Memory architecture with reconfigurable SAR ADCs for energy-efficient deep neural networks in edge ML applications\",\"authors\":\"Avishek Biswas, Hetul Sanghvi, M. Mehendale, G. Preet\",\"doi\":\"10.1109/CICC53496.2022.9772789\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Compute-In-Memory (CIM) is a promising approach to enable low power Machine Learning (ML) based applications on edge devices, since it significantly reduces data movement by embedding computations inside or near the memory, unlike traditional all-digital implementations. Conventional 6-transistor (6T) SRAM bit-cell based CIM approaches [1]–[3] suffer from bit-cell disturb issue due to accessing multiple cells in a column, limiting the dynamic voltage range allowed for analog dot-product (DP) computations. They are also highly prone to bit-cell discharge current (Icell) variation, degrading the overall accuracy of the neural network (NN) inference. Alternate approaches e.g. [4] requires a custom-designed 10T bitcell which consumes 2-3x larger cell area. To address these challenges, we present an area-efficient CIM approach (CIM-D6T) which uses compact 6T foundry bit-cells while achieving robustness to bit-cell Vt variations and eliminates any read disturb issues, improving the dynamic voltage range for DP. This is achieved by decoupling the 6T cell read from the analog DP computation. As shown in Fig. 1, a pair of extra metal capacitors (Cm) connected to the lines XAp, XAn are added over the SRAM column to store and process the analog voltages for the DP's. The 6T cells in a row are read locally and the read data values are used in the local LRW+MAVa circuit to discharge the analog voltage on the XAp/XAn capacitor to ground. These extra capacitors do not consume additional silicon area since they are implemented as metal comb capacitors over the existing SRAM array using higher metal layers. Fig. 1 shows the overall architecture of the proposed CIM half-array with 256x64 6T bit-cells, split into 16 sub-arrays each with 16 rows and 64 columns. Weights for different 3D filters in a given NN layer (output channel dimension) are mapped to a different sub-array. A group of 2 local columns with 16 rows in each form 1 mux-ed local column (LCOLmx) and hence, each sub-array has 32 parallel ports for input feature map (IFMP) values and the weights. Each LCOLmx along the vertical dimension share a single DAC, which converts a 6-b unsigned digital input (XIN[5:0]) to an analog voltage (0 to Vref). The same analog voltage (Va) is shared across all sub-arrays along a column.\",\"PeriodicalId\":415990,\"journal\":{\"name\":\"2022 IEEE Custom Integrated Circuits Conference (CICC)\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Custom Integrated Circuits Conference (CICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICC53496.2022.9772789\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC53496.2022.9772789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

内存计算(CIM)是一种很有前途的方法,可以在边缘设备上实现基于低功耗机器学习(ML)的应用程序,因为它通过在内存内部或附近嵌入计算来显著减少数据移动,这与传统的全数字实现不同。传统的6晶体管(6T) SRAM基于位单元的CIM方法[1]-[3]由于在一列中访问多个单元,限制了模拟点积(DP)计算所允许的动态电压范围,因此存在位单元干扰问题。它们也很容易发生位单元放电电流(Icell)变化,从而降低神经网络(NN)推理的整体准确性。替代方法,例如[4]需要定制设计的10T位单元,其消耗的单元面积增加2-3倍。为了解决这些挑战,我们提出了一种面积高效的CIM方法(CIM- d6t),该方法使用紧凑的6T铸造位单元,同时实现对位单元Vt变化的鲁棒性,并消除了任何读取干扰问题,提高了DP的动态电压范围。这是通过从模拟DP计算中读取的6T单元解耦来实现的。如图1所示,在SRAM列上添加一对额外的金属电容器(Cm),连接到XAp, XAn线,以存储和处理DP的模拟电压。一行6T单元在本地读取,读取的数据值在本地LRW+MAVa电路中使用,以将XAp/XAn电容器上的模拟电压放电到地。这些额外的电容器不消耗额外的硅面积,因为它们是在现有的SRAM阵列上使用更高的金属层实现的金属梳状电容器。图1显示了所提出的具有256x64个6T位单元的CIM半阵列的总体架构,分为16个子阵列,每个子阵列有16行64列。给定神经网络层(输出通道维度)中不同3D滤波器的权重被映射到不同的子阵列。一组2个局部列,每个列16行,形成1个混合局部列(LCOLmx),因此,每个子阵列有32个并行端口用于输入特征映射(IFMP)值和权重。每个LCOLmx沿垂直尺寸共享一个DAC,它将6-b无符号数字输入(XIN[5:0])转换为模拟电压(0到Vref)。相同的模拟电压(Va)在沿列的所有子阵列上共享。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An area-efficient 6T-SRAM based Compute-In-Memory architecture with reconfigurable SAR ADCs for energy-efficient deep neural networks in edge ML applications
Compute-In-Memory (CIM) is a promising approach to enable low power Machine Learning (ML) based applications on edge devices, since it significantly reduces data movement by embedding computations inside or near the memory, unlike traditional all-digital implementations. Conventional 6-transistor (6T) SRAM bit-cell based CIM approaches [1]–[3] suffer from bit-cell disturb issue due to accessing multiple cells in a column, limiting the dynamic voltage range allowed for analog dot-product (DP) computations. They are also highly prone to bit-cell discharge current (Icell) variation, degrading the overall accuracy of the neural network (NN) inference. Alternate approaches e.g. [4] requires a custom-designed 10T bitcell which consumes 2-3x larger cell area. To address these challenges, we present an area-efficient CIM approach (CIM-D6T) which uses compact 6T foundry bit-cells while achieving robustness to bit-cell Vt variations and eliminates any read disturb issues, improving the dynamic voltage range for DP. This is achieved by decoupling the 6T cell read from the analog DP computation. As shown in Fig. 1, a pair of extra metal capacitors (Cm) connected to the lines XAp, XAn are added over the SRAM column to store and process the analog voltages for the DP's. The 6T cells in a row are read locally and the read data values are used in the local LRW+MAVa circuit to discharge the analog voltage on the XAp/XAn capacitor to ground. These extra capacitors do not consume additional silicon area since they are implemented as metal comb capacitors over the existing SRAM array using higher metal layers. Fig. 1 shows the overall architecture of the proposed CIM half-array with 256x64 6T bit-cells, split into 16 sub-arrays each with 16 rows and 64 columns. Weights for different 3D filters in a given NN layer (output channel dimension) are mapped to a different sub-array. A group of 2 local columns with 16 rows in each form 1 mux-ed local column (LCOLmx) and hence, each sub-array has 32 parallel ports for input feature map (IFMP) values and the weights. Each LCOLmx along the vertical dimension share a single DAC, which converts a 6-b unsigned digital input (XIN[5:0]) to an analog voltage (0 to Vref). The same analog voltage (Va) is shared across all sub-arrays along a column.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
All Rivers Flow to the Sea: A High Power Density Wireless Power Receiver with Split-Dual-Path Rectification and Hybrid-Quad-Path Step-Down Conversion A 400-to-12 V Fully Integrated Switched-Capacitor DC-DC Converter Achieving 119 mW/mm2 at 63.6 % Efficiency A 0.14nJ/b 200Mb/s Quasi-Balanced FSK Transceiver with Closed-Loop Modulation and Sideband Energy Detection A 2GHz voltage mode power scalable RF-Front-End with 2.5dB-NF and 0.5dBm-1dBCP High-Speed Digital-to-Analog Converter Design Towards High Dynamic Range
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1