用于高级人工智能边缘芯片的整数-浮点双模增益单元内存计算宏程序

IF 5.6 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Solid-state Circuits Pub Date : 2024-10-15 DOI:10.1109/JSSC.2024.3470215

Ping-Chun Wu;Win-San Khwa;Jui-Jen Wu;Jian-Wei Su;Chuan-Jia Jhang;Ho-Yu Chen;Zhao-En Ke;Ting-Chien Chiu;Jun-Ming Hsu;Chiao-Yen Cheng;Yu-Chen Chen;Chung-Chuan Lo;Ren-Shuo Liu;Chih-Cheng Hsieh;Kea-Tiong Tang;Meng-Fan Chang

{"title":"用于高级人工智能边缘芯片的整数-浮点双模增益单元内存计算宏程序","authors":"Ping-Chun Wu;Win-San Khwa;Jui-Jen Wu;Jian-Wei Su;Chuan-Jia Jhang;Ho-Yu Chen;Zhao-En Ke;Ting-Chien Chiu;Jun-Ming Hsu;Chiao-Yen Cheng;Yu-Chen Chen;Chung-Chuan Lo;Ren-Shuo Liu;Chih-Cheng Hsieh;Kea-Tiong Tang;Meng-Fan Chang","doi":"10.1109/JSSC.2024.3470215","DOIUrl":null,"url":null,"abstract":"This article presents a novel integer-floating-point (INT-FP) gain-cell (GC)-computing-in-memory (CIM) structure for high-precision multiply-and-accumulate (MAC) operations with high computational flexibility, energy efficiency, and inference accuracy. The proposed device employs: 1) a dual-mode zone-based input processing scheme (ZB-IPS) aimed at eliminating exponent subtraction in order to enhance energy and area efficiency (AEF); 2) a dual-mode local computing cell (DM-LCC) to reuse exponent addition as an adder tree stage for INT-MAC to enhance AEF in both INT and floating-point (FP) modes; and 3) a stationary-based two-port GC array (SB-TP-GCA) to enable concurrent data updates and computation while reducing system-to-CIM and internal data accesses to improve energy efficiency. A 16-nm FinFET 108-kb GC-CIM macro fabricated using 4T gain cells (GCs) achieved energy efficiency of 99.5 TOPS/W in INT-MAC operations involving 128 accumulations of 8b-input, 8b-weight, and 23b-output; and 46.4 TFLOPS/W in FP-MAC operations involving 64 accumulations of BF16-input, BF16-weight, and FP32-output.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 1","pages":"158-170"},"PeriodicalIF":5.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Integer-Floating-Point Dual-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Chips\",\"authors\":\"Ping-Chun Wu;Win-San Khwa;Jui-Jen Wu;Jian-Wei Su;Chuan-Jia Jhang;Ho-Yu Chen;Zhao-En Ke;Ting-Chien Chiu;Jun-Ming Hsu;Chiao-Yen Cheng;Yu-Chen Chen;Chung-Chuan Lo;Ren-Shuo Liu;Chih-Cheng Hsieh;Kea-Tiong Tang;Meng-Fan Chang\",\"doi\":\"10.1109/JSSC.2024.3470215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents a novel integer-floating-point (INT-FP) gain-cell (GC)-computing-in-memory (CIM) structure for high-precision multiply-and-accumulate (MAC) operations with high computational flexibility, energy efficiency, and inference accuracy. The proposed device employs: 1) a dual-mode zone-based input processing scheme (ZB-IPS) aimed at eliminating exponent subtraction in order to enhance energy and area efficiency (AEF); 2) a dual-mode local computing cell (DM-LCC) to reuse exponent addition as an adder tree stage for INT-MAC to enhance AEF in both INT and floating-point (FP) modes; and 3) a stationary-based two-port GC array (SB-TP-GCA) to enable concurrent data updates and computation while reducing system-to-CIM and internal data accesses to improve energy efficiency. A 16-nm FinFET 108-kb GC-CIM macro fabricated using 4T gain cells (GCs) achieved energy efficiency of 99.5 TOPS/W in INT-MAC operations involving 128 accumulations of 8b-input, 8b-weight, and 23b-output; and 46.4 TFLOPS/W in FP-MAC operations involving 64 accumulations of BF16-input, BF16-weight, and FP32-output.\",\"PeriodicalId\":13129,\"journal\":{\"name\":\"IEEE Journal of Solid-state Circuits\",\"volume\":\"60 1\",\"pages\":\"158-170\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Solid-state Circuits\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10716755/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10716755/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新的整数浮点（INT-FP）增益单元（GC）内存计算（CIM）结构，用于高精度的乘法和累积（MAC）操作，具有很高的计算灵活性、能量效率和推理精度。该器件采用：1)双模基于区域的输入处理方案（ZB-IPS），旨在消除指数减法，以提高能量和面积效率（AEF）；2)采用双模局部计算单元（DM-LCC）重用指数加法作为INT- mac的加法树阶段，以增强INT和浮点（FP）模式下的AEF；3)基于固定的双端口GC阵列（SB-TP-GCA），以实现并发数据更新和计算，同时减少系统到cim和内部数据访问，以提高能源效率。使用4T增益单元（gc）制造的16nm FinFET 108-kb GC-CIM宏在涉及128个8b-输入，8b-重量和23b-输出累积的INT-MAC操作中实现了99.5 TOPS/W的能量效率；在涉及BF16-input、BF16-weight和FP32-output的64个累积的FP-MAC操作中，为46.4 TFLOPS/W。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Integer-Floating-Point Dual-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Chips

This article presents a novel integer-floating-point (INT-FP) gain-cell (GC)-computing-in-memory (CIM) structure for high-precision multiply-and-accumulate (MAC) operations with high computational flexibility, energy efficiency, and inference accuracy. The proposed device employs: 1) a dual-mode zone-based input processing scheme (ZB-IPS) aimed at eliminating exponent subtraction in order to enhance energy and area efficiency (AEF); 2) a dual-mode local computing cell (DM-LCC) to reuse exponent addition as an adder tree stage for INT-MAC to enhance AEF in both INT and floating-point (FP) modes; and 3) a stationary-based two-port GC array (SB-TP-GCA) to enable concurrent data updates and computation while reducing system-to-CIM and internal data accesses to improve energy efficiency. A 16-nm FinFET 108-kb GC-CIM macro fabricated using 4T gain cells (GCs) achieved energy efficiency of 99.5 TOPS/W in INT-MAC operations involving 128 accumulations of 8b-input, 8b-weight, and 23b-output; and 46.4 TFLOPS/W in FP-MAC operations involving 64 accumulations of BF16-input, BF16-weight, and FP32-output.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Journal of Solid-state Circuits 工程技术-工程：电子与电气

CiteScore

11.00

自引率

20.40%

发文量

351

审稿时长

3-6 weeks

期刊介绍： The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.