DCT-RAM: A Driver-Free Process-In-Memory 8T SRAM Macro with Multi-Bit Charge-Domain Computation and Time-Domain Quantization

Zhiyu Chen, Qing Jin, Zhanghao Yu, Yanzhi Wang, Kaiyuan Yang
{"title":"DCT-RAM: A Driver-Free Process-In-Memory 8T SRAM Macro with Multi-Bit Charge-Domain Computation and Time-Domain Quantization","authors":"Zhiyu Chen, Qing Jin, Zhanghao Yu, Yanzhi Wang, Kaiyuan Yang","doi":"10.1109/CICC53496.2022.9772826","DOIUrl":null,"url":null,"abstract":"Process-In-Memory (PIM) is a promising solution to alleviating the memory-wall bottleneck in memory-intensive applications like CNNs. Recent demonstrations of SRAM-based PIM designs, particularly those computing in the charge domain [1]–[5], have greatly improved the linearity of analog multiply-and-add computations (MAC) and quantization, and their robustness to process variations, making their inference accuracy approach that of digital hardware in practical computer vision benchmarks such as CIFAR-10. However, there remain several limitations towards large scale integration of PIM macros, especially the assumptions on the availability of powerful external reference voltage drivers and the lack of scaling friendly designs. More specifically, high-bandwidth analog buffers driving large output load are necessary to distribute the massive number of analog signals (e.g. DAC outputs) across the macro, without sacrificing signal fidelity and computing speed. [10] is one work that reports its DAC drivers occupying 11.4% of the macro area and incurring 94-pJ energy overhead in 28 nm, accounting for 68.5% of the total energy in a macro supporting $5\\mathrm{b}$ activations and $8\\mathrm{b}$ weight. Second, SAR ADCs are popular for the common 5–9 bit resolution range. High-speed power-hungry analog buffers are required in conventional SAR ADCs to drive the capacitive DACs (CDACs) to reference voltages, with short settling time and high accuracy. Given the hundreds of ADCs in each macro, the design complexity and overheads incurred by these drivers are dominant. Our simulated reference driver takes 2.9-pJ energy in 65 nm, which is comparable to an ADC (e.g. 3.56 $\\text{pJ}$ in [12]). Third, it is challenging to fit any conventional $\\geq 7\\mathrm{b}$ SAR ADC into the narrow width of SRAM cells due to the bulky CDACs and layout matching requirements, ultimately limiting the computing parallelism and energy amortization.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC53496.2022.9772826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Process-In-Memory (PIM) is a promising solution to alleviating the memory-wall bottleneck in memory-intensive applications like CNNs. Recent demonstrations of SRAM-based PIM designs, particularly those computing in the charge domain [1]–[5], have greatly improved the linearity of analog multiply-and-add computations (MAC) and quantization, and their robustness to process variations, making their inference accuracy approach that of digital hardware in practical computer vision benchmarks such as CIFAR-10. However, there remain several limitations towards large scale integration of PIM macros, especially the assumptions on the availability of powerful external reference voltage drivers and the lack of scaling friendly designs. More specifically, high-bandwidth analog buffers driving large output load are necessary to distribute the massive number of analog signals (e.g. DAC outputs) across the macro, without sacrificing signal fidelity and computing speed. [10] is one work that reports its DAC drivers occupying 11.4% of the macro area and incurring 94-pJ energy overhead in 28 nm, accounting for 68.5% of the total energy in a macro supporting $5\mathrm{b}$ activations and $8\mathrm{b}$ weight. Second, SAR ADCs are popular for the common 5–9 bit resolution range. High-speed power-hungry analog buffers are required in conventional SAR ADCs to drive the capacitive DACs (CDACs) to reference voltages, with short settling time and high accuracy. Given the hundreds of ADCs in each macro, the design complexity and overheads incurred by these drivers are dominant. Our simulated reference driver takes 2.9-pJ energy in 65 nm, which is comparable to an ADC (e.g. 3.56 $\text{pJ}$ in [12]). Third, it is challenging to fit any conventional $\geq 7\mathrm{b}$ SAR ADC into the narrow width of SRAM cells due to the bulky CDACs and layout matching requirements, ultimately limiting the computing parallelism and energy amortization.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DCT-RAM:具有多比特电荷域计算和时域量化的无驱动进程内存8T SRAM宏
内存中进程(PIM)是一种很有前途的解决方案,可以缓解像cnn这样的内存密集型应用程序中的内存墙瓶颈。最近基于sram的PIM设计的演示,特别是在电荷域[1]-[5]的计算,极大地提高了模拟乘法和加法计算(MAC)和量化的线性度,以及它们对过程变化的鲁棒性,使其推理精度接近实际计算机视觉基准中的数字硬件,如CIFAR-10。然而,PIM宏的大规模集成仍然存在一些限制,特别是对强大的外部参考电压驱动器的可用性的假设以及缺乏缩放友好的设计。更具体地说,在不牺牲信号保真度和计算速度的情况下,需要高带宽模拟缓冲驱动大输出负载,以便在宏中分配大量模拟信号(例如DAC输出)。[10]是一个报告其DAC驱动程序占用11.4的工作% of the macro area and incurring 94-pJ energy overhead in 28 nm, accounting for 68.5% of the total energy in a macro supporting $5\mathrm{b}$ activations and $8\mathrm{b}$ weight. Second, SAR ADCs are popular for the common 5–9 bit resolution range. High-speed power-hungry analog buffers are required in conventional SAR ADCs to drive the capacitive DACs (CDACs) to reference voltages, with short settling time and high accuracy. Given the hundreds of ADCs in each macro, the design complexity and overheads incurred by these drivers are dominant. Our simulated reference driver takes 2.9-pJ energy in 65 nm, which is comparable to an ADC (e.g. 3.56 $\text{pJ}$ in [12]). Third, it is challenging to fit any conventional $\geq 7\mathrm{b}$ SAR ADC into the narrow width of SRAM cells due to the bulky CDACs and layout matching requirements, ultimately limiting the computing parallelism and energy amortization.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
All Rivers Flow to the Sea: A High Power Density Wireless Power Receiver with Split-Dual-Path Rectification and Hybrid-Quad-Path Step-Down Conversion A 400-to-12 V Fully Integrated Switched-Capacitor DC-DC Converter Achieving 119 mW/mm2 at 63.6 % Efficiency A 0.14nJ/b 200Mb/s Quasi-Balanced FSK Transceiver with Closed-Loop Modulation and Sideband Energy Detection A 2GHz voltage mode power scalable RF-Front-End with 2.5dB-NF and 0.5dBm-1dBCP High-Speed Digital-to-Analog Converter Design Towards High Dynamic Range
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1