A 818–4094 TOPS/W Capacitor-Reconfigured Analog CIM for Unified Acceleration of CNNs and Transformers

IF 5.6 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Solid-state Circuits Pub Date : 2024-09-24 DOI:10.1109/JSSC.2024.3457898
Kentaro Yoshioka
{"title":"A 818–4094 TOPS/W Capacitor-Reconfigured Analog CIM for Unified Acceleration of CNNs and Transformers","authors":"Kentaro Yoshioka","doi":"10.1109/JSSC.2024.3457898","DOIUrl":null,"url":null,"abstract":"The rapid evolution of machine learning has led to the emergence of diverse neural network architectures, such as CNNs, Transformers, and their hybrid models, each with unique computational precision requirements. Transformers, in particular, demand higher precision compared to CNNs. Existing analog compute-in-memory (ACIM) solutions primarily cater to CNNs and struggle to achieve the high precision necessary for Transformers, despite their promise in addressing the memory bottleneck. To bridge this gap, we propose a capacitor-reconfigured CIM (CR-CIM) macro that introduces dual-mode operation, dynamically switching between high-precision and high-efficiency modes based on the active DNN layer. In the CNN mode, the CR-CIM employs bit-parallel computation and an 8-bit ADC to maximize power efficiency, exploiting the inherent error tolerance of CNNs. In contrast, for the Transformer mode, the CR-CIM switches to bit-serial computation and a 10-bit ADC to boost the compute signal-to-noise ratio (CSNR), ensuring the higher precision required by Transformers. This dual-mode functionality of the proposed CR-CIM is enabled by three key technologies: 1) a novel CR-CIM architecture and cell structure; 2) a resource-efficient multi-bit driver for bit-parallel computation; and 3) a software-analog co-design (SAC) strategy for enhanced Transformer computation. Our CR-CIM prototype is the first ACIM design to enable optimized operation for both Transformers and CNNs. CR-CIM achieves 45-dB signal-to-quantization-noise ratio (SQNR) and 31-dB CSNR (8-bit input and 8-bit weight bit-serial MAC) in the Transformer mode and a peak-power efficiency of 4094 TOPS/W (normalized to 1-bit <inline-formula> <tex-math>${\\times } 1$ </tex-math></inline-formula>-bit MAC) in the CNN mode.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 5","pages":"1844-1855"},"PeriodicalIF":5.6000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10689660","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10689660/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid evolution of machine learning has led to the emergence of diverse neural network architectures, such as CNNs, Transformers, and their hybrid models, each with unique computational precision requirements. Transformers, in particular, demand higher precision compared to CNNs. Existing analog compute-in-memory (ACIM) solutions primarily cater to CNNs and struggle to achieve the high precision necessary for Transformers, despite their promise in addressing the memory bottleneck. To bridge this gap, we propose a capacitor-reconfigured CIM (CR-CIM) macro that introduces dual-mode operation, dynamically switching between high-precision and high-efficiency modes based on the active DNN layer. In the CNN mode, the CR-CIM employs bit-parallel computation and an 8-bit ADC to maximize power efficiency, exploiting the inherent error tolerance of CNNs. In contrast, for the Transformer mode, the CR-CIM switches to bit-serial computation and a 10-bit ADC to boost the compute signal-to-noise ratio (CSNR), ensuring the higher precision required by Transformers. This dual-mode functionality of the proposed CR-CIM is enabled by three key technologies: 1) a novel CR-CIM architecture and cell structure; 2) a resource-efficient multi-bit driver for bit-parallel computation; and 3) a software-analog co-design (SAC) strategy for enhanced Transformer computation. Our CR-CIM prototype is the first ACIM design to enable optimized operation for both Transformers and CNNs. CR-CIM achieves 45-dB signal-to-quantization-noise ratio (SQNR) and 31-dB CSNR (8-bit input and 8-bit weight bit-serial MAC) in the Transformer mode and a peak-power efficiency of 4094 TOPS/W (normalized to 1-bit ${\times } 1$ -bit MAC) in the CNN mode.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A 818-4094 TOPS/W 用于统一加速 CNN 和变压器的电容重构模拟 CIM
机器学习的快速发展导致了各种神经网络架构的出现,例如cnn, transformer及其混合模型,每个模型都有独特的计算精度要求。与cnn相比,变压器尤其需要更高的精度。现有的模拟内存计算(ACIM)解决方案主要迎合cnn,尽管它们承诺解决内存瓶颈,但难以实现变压器所需的高精度。为了弥补这一差距,我们提出了一个电容重构CIM (CR-CIM)宏,该宏引入了双模式操作,基于主动深度神经网络层在高精度和高效率模式之间动态切换。在CNN模式下,CR-CIM采用位并行计算和8位ADC,最大限度地提高了功率效率,利用了CNN固有的容错能力。相比之下,对于Transformer模式,CR-CIM切换到位串行计算和10位ADC来提高计算信噪比(CSNR),确保Transformer所需的更高精度。提出的CR-CIM的双模功能由三个关键技术实现:1)新的CR-CIM架构和单元结构;2)用于位并行计算的资源高效多比特驱动程序;3)一种增强Transformer计算能力的软件模拟协同设计(SAC)策略。我们的CR-CIM原型是第一个能够优化变压器和cnn运行的ACIM设计。CR-CIM在Transformer模式下可实现45 db的信噪比(SQNR)和31 db的CSNR(8位输入和8位权重位串行MAC),在CNN模式下可实现4094 TOPS/W的峰值功率效率(归一化为1位${\times} 1$位MAC)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Journal of Solid-state Circuits
IEEE Journal of Solid-state Circuits 工程技术-工程:电子与电气
CiteScore
11.00
自引率
20.40%
发文量
351
审稿时长
3-6 weeks
期刊介绍: The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.
期刊最新文献
A 10.1-ENOB 8kHz Bandwidth 95–250nW PVT-Robust DT Level-Crossing ADC for Sparse and Generic Signals Guest Editorial Introduction to the Special Section on the 2025 RFIC Symposium A 28-nm System-in-One-Macro Computing-in-Memory Chip Utilizing Leakage-Eliminated 2T1C and Capacitor-Over-Logic 1T1C eDRAM A MEMS-Free 4096-Pixel CMOS E-Nose Array With MOF-Based Molecular Selectivity, In-Pixel Thermal Regeneration, and a Compact Single-Coefficient Bandpass Sigma–Delta ADC Editorial New Associate Editor
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1