{"title":"A 818–4094 TOPS/W Capacitor-Reconfigured Analog CIM for Unified Acceleration of CNNs and Transformers","authors":"Kentaro Yoshioka","doi":"10.1109/JSSC.2024.3457898","DOIUrl":null,"url":null,"abstract":"The rapid evolution of machine learning has led to the emergence of diverse neural network architectures, such as CNNs, Transformers, and their hybrid models, each with unique computational precision requirements. Transformers, in particular, demand higher precision compared to CNNs. Existing analog compute-in-memory (ACIM) solutions primarily cater to CNNs and struggle to achieve the high precision necessary for Transformers, despite their promise in addressing the memory bottleneck. To bridge this gap, we propose a capacitor-reconfigured CIM (CR-CIM) macro that introduces dual-mode operation, dynamically switching between high-precision and high-efficiency modes based on the active DNN layer. In the CNN mode, the CR-CIM employs bit-parallel computation and an 8-bit ADC to maximize power efficiency, exploiting the inherent error tolerance of CNNs. In contrast, for the Transformer mode, the CR-CIM switches to bit-serial computation and a 10-bit ADC to boost the compute signal-to-noise ratio (CSNR), ensuring the higher precision required by Transformers. This dual-mode functionality of the proposed CR-CIM is enabled by three key technologies: 1) a novel CR-CIM architecture and cell structure; 2) a resource-efficient multi-bit driver for bit-parallel computation; and 3) a software-analog co-design (SAC) strategy for enhanced Transformer computation. Our CR-CIM prototype is the first ACIM design to enable optimized operation for both Transformers and CNNs. CR-CIM achieves 45-dB signal-to-quantization-noise ratio (SQNR) and 31-dB CSNR (8-bit input and 8-bit weight bit-serial MAC) in the Transformer mode and a peak-power efficiency of 4094 TOPS/W (normalized to 1-bit <inline-formula> <tex-math>${\\times } 1$ </tex-math></inline-formula>-bit MAC) in the CNN mode.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 5","pages":"1844-1855"},"PeriodicalIF":5.6000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10689660","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10689660/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid evolution of machine learning has led to the emergence of diverse neural network architectures, such as CNNs, Transformers, and their hybrid models, each with unique computational precision requirements. Transformers, in particular, demand higher precision compared to CNNs. Existing analog compute-in-memory (ACIM) solutions primarily cater to CNNs and struggle to achieve the high precision necessary for Transformers, despite their promise in addressing the memory bottleneck. To bridge this gap, we propose a capacitor-reconfigured CIM (CR-CIM) macro that introduces dual-mode operation, dynamically switching between high-precision and high-efficiency modes based on the active DNN layer. In the CNN mode, the CR-CIM employs bit-parallel computation and an 8-bit ADC to maximize power efficiency, exploiting the inherent error tolerance of CNNs. In contrast, for the Transformer mode, the CR-CIM switches to bit-serial computation and a 10-bit ADC to boost the compute signal-to-noise ratio (CSNR), ensuring the higher precision required by Transformers. This dual-mode functionality of the proposed CR-CIM is enabled by three key technologies: 1) a novel CR-CIM architecture and cell structure; 2) a resource-efficient multi-bit driver for bit-parallel computation; and 3) a software-analog co-design (SAC) strategy for enhanced Transformer computation. Our CR-CIM prototype is the first ACIM design to enable optimized operation for both Transformers and CNNs. CR-CIM achieves 45-dB signal-to-quantization-noise ratio (SQNR) and 31-dB CSNR (8-bit input and 8-bit weight bit-serial MAC) in the Transformer mode and a peak-power efficiency of 4094 TOPS/W (normalized to 1-bit ${\times } 1$ -bit MAC) in the CNN mode.
期刊介绍:
The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.