A 22 nm Floating-Point ReRAM Compute-in-Memory Macro Using Residue-Shared ADC for AI Edge Device

IF 5.6 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Solid-state Circuits Pub Date : 2024-10-22 DOI:10.1109/JSSC.2024.3470211
Hung-Hsi Hsu;Tai-Hao Wen;Win-San Khwa;Wei-Hsing Huang;Zhao-En Ke;Yu-Hsiang Chin;Hua-Jin Wen;Yu-Chen Chang;Wei-Ting Hsu;Ashwin Sanjay Lele;Bo Zhang;Ping-Sheng Wu;Chung-Chuan Lo;Ren-Shuo Liu;Chih-Cheng Hsieh;Kea-Tiong Tang;Shih-Hsin Teng;Chung-Cheng Chou;Yu-Der Chih;Tsung-Yung Jonathan Chang;Meng-Fan Chang
{"title":"A 22 nm Floating-Point ReRAM Compute-in-Memory Macro Using Residue-Shared ADC for AI Edge Device","authors":"Hung-Hsi Hsu;Tai-Hao Wen;Win-San Khwa;Wei-Hsing Huang;Zhao-En Ke;Yu-Hsiang Chin;Hua-Jin Wen;Yu-Chen Chang;Wei-Ting Hsu;Ashwin Sanjay Lele;Bo Zhang;Ping-Sheng Wu;Chung-Chuan Lo;Ren-Shuo Liu;Chih-Cheng Hsieh;Kea-Tiong Tang;Shih-Hsin Teng;Chung-Cheng Chou;Yu-Der Chih;Tsung-Yung Jonathan Chang;Meng-Fan Chang","doi":"10.1109/JSSC.2024.3470211","DOIUrl":null,"url":null,"abstract":"Artificial intelligence (AI) edge devices increasingly require the enhanced accuracy of floating-point (FP) multiply-and-accumulate (MAC) operations as well as nonvolatile on-chip memory to minimize the movement of weight data in power-off mode. Designing non-volatile compute-in-memory (nvCIM) macros for FP operations imposes several challenges, including: 1) a tradeoff between inference accuracy and weight bit-width following pre-alignment; 2) long computing latency and high energy consumption; 3) large cell array current during computation; and 4) high multi-bit readout energy consumption. In this study, we devised four schemes to address these issues, including: 1) a kernel-wise weight pre-alignment (K-WPA); 2) a rescheduled multi-bit input compression (RS-MIC); 3) HRS-favored dual-sign-bit (HF-DSB); and 4) residue-shared analog-to-digital converter (RS-ADC). A 16 Mb resistive random access memory (ReRAM) nvCIM macro fabricated for FP operations using foundry-provided ReRAM (22 nm CMOS technology) achieved an efficiency of 34.2 TFLOPS/W under BF16-input, BF16-weight, and FP32-output and 31.4 TFLOPS/W under FP16-input, FP16-weight, and FP32-output.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 1","pages":"171-183"},"PeriodicalIF":5.6000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10726927/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) edge devices increasingly require the enhanced accuracy of floating-point (FP) multiply-and-accumulate (MAC) operations as well as nonvolatile on-chip memory to minimize the movement of weight data in power-off mode. Designing non-volatile compute-in-memory (nvCIM) macros for FP operations imposes several challenges, including: 1) a tradeoff between inference accuracy and weight bit-width following pre-alignment; 2) long computing latency and high energy consumption; 3) large cell array current during computation; and 4) high multi-bit readout energy consumption. In this study, we devised four schemes to address these issues, including: 1) a kernel-wise weight pre-alignment (K-WPA); 2) a rescheduled multi-bit input compression (RS-MIC); 3) HRS-favored dual-sign-bit (HF-DSB); and 4) residue-shared analog-to-digital converter (RS-ADC). A 16 Mb resistive random access memory (ReRAM) nvCIM macro fabricated for FP operations using foundry-provided ReRAM (22 nm CMOS technology) achieved an efficiency of 34.2 TFLOPS/W under BF16-input, BF16-weight, and FP32-output and 31.4 TFLOPS/W under FP16-input, FP16-weight, and FP32-output.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用残差共享 ADC 的 22 纳米浮点 ReRAM 内存计算宏,用于人工智能边缘设备
人工智能(AI)边缘设备越来越需要提高浮点(FP)乘法和累积(MAC)操作的准确性,以及非易失性片上存储器,以最大限度地减少断电模式下权重数据的移动。为FP操作设计非易失性内存中计算(nvCIM)宏带来了几个挑战,包括:1)在预对齐后的推理精度和权重位宽度之间进行权衡;2)计算延迟长,能耗高;3)计算时单元阵列电流大;4)多比特读出能耗高。在本研究中,我们设计了四种方案来解决这些问题,包括:1)核加权预校准(K-WPA);2)重新调度的多比特输入压缩(RS-MIC);3) hrs青睐的双符号位(HF-DSB);4)剩余共享模数转换器(RS-ADC)。采用代工提供的ReRAM (22 nm CMOS技术)制造FP操作的16mb电阻随机存储器(ReRAM) nvCIM宏,在bf16输入、bf16重量和fp32输出下的效率为34.2 TFLOPS/W,在fp16输入、fp16重量和fp32输出下的效率为31.4 TFLOPS/W。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Journal of Solid-state Circuits
IEEE Journal of Solid-state Circuits 工程技术-工程:电子与电气
CiteScore
11.00
自引率
20.40%
发文量
351
审稿时长
3-6 weeks
期刊介绍: The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.
期刊最新文献
A 7.5- μ W 35-Keyword End-to-End Keyword Spotting System With Random Augmented On-Chip Training A 25-Mpts/s Back-Illuminated Stacked SPAD Direct Time-of-Flight Depth Sensor With Equivalent Time Sampling and Pixel-Level Threshold Control for Automotive LiDAR An Electrophysiology-Optogenetics Closed-Loop Bi-Directional Neural Interface for Sleep Regulation With 0.2-μJ/class Multiplexer-Based Neural Network An Optimal Modulation Bits-to-RF Digital Transmitter Using Time-Interleaved Multi-Subharmonic Switching An Energy-Efficient and High-Accuracy Dual-ModeECG AI Processor via Deep-Fold and Zero-Free Fine-Grained Quantization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1