{"title":"用于高级人工智能边缘芯片的整数-浮点双模增益单元内存计算宏程序","authors":"Ping-Chun Wu;Win-San Khwa;Jui-Jen Wu;Jian-Wei Su;Chuan-Jia Jhang;Ho-Yu Chen;Zhao-En Ke;Ting-Chien Chiu;Jun-Ming Hsu;Chiao-Yen Cheng;Yu-Chen Chen;Chung-Chuan Lo;Ren-Shuo Liu;Chih-Cheng Hsieh;Kea-Tiong Tang;Meng-Fan Chang","doi":"10.1109/JSSC.2024.3470215","DOIUrl":null,"url":null,"abstract":"This article presents a novel integer-floating-point (INT-FP) gain-cell (GC)-computing-in-memory (CIM) structure for high-precision multiply-and-accumulate (MAC) operations with high computational flexibility, energy efficiency, and inference accuracy. The proposed device employs: 1) a dual-mode zone-based input processing scheme (ZB-IPS) aimed at eliminating exponent subtraction in order to enhance energy and area efficiency (AEF); 2) a dual-mode local computing cell (DM-LCC) to reuse exponent addition as an adder tree stage for INT-MAC to enhance AEF in both INT and floating-point (FP) modes; and 3) a stationary-based two-port GC array (SB-TP-GCA) to enable concurrent data updates and computation while reducing system-to-CIM and internal data accesses to improve energy efficiency. A 16-nm FinFET 108-kb GC-CIM macro fabricated using 4T gain cells (GCs) achieved energy efficiency of 99.5 TOPS/W in INT-MAC operations involving 128 accumulations of 8b-input, 8b-weight, and 23b-output; and 46.4 TFLOPS/W in FP-MAC operations involving 64 accumulations of BF16-input, BF16-weight, and FP32-output.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 1","pages":"158-170"},"PeriodicalIF":5.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Integer-Floating-Point Dual-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Chips\",\"authors\":\"Ping-Chun Wu;Win-San Khwa;Jui-Jen Wu;Jian-Wei Su;Chuan-Jia Jhang;Ho-Yu Chen;Zhao-En Ke;Ting-Chien Chiu;Jun-Ming Hsu;Chiao-Yen Cheng;Yu-Chen Chen;Chung-Chuan Lo;Ren-Shuo Liu;Chih-Cheng Hsieh;Kea-Tiong Tang;Meng-Fan Chang\",\"doi\":\"10.1109/JSSC.2024.3470215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents a novel integer-floating-point (INT-FP) gain-cell (GC)-computing-in-memory (CIM) structure for high-precision multiply-and-accumulate (MAC) operations with high computational flexibility, energy efficiency, and inference accuracy. The proposed device employs: 1) a dual-mode zone-based input processing scheme (ZB-IPS) aimed at eliminating exponent subtraction in order to enhance energy and area efficiency (AEF); 2) a dual-mode local computing cell (DM-LCC) to reuse exponent addition as an adder tree stage for INT-MAC to enhance AEF in both INT and floating-point (FP) modes; and 3) a stationary-based two-port GC array (SB-TP-GCA) to enable concurrent data updates and computation while reducing system-to-CIM and internal data accesses to improve energy efficiency. A 16-nm FinFET 108-kb GC-CIM macro fabricated using 4T gain cells (GCs) achieved energy efficiency of 99.5 TOPS/W in INT-MAC operations involving 128 accumulations of 8b-input, 8b-weight, and 23b-output; and 46.4 TFLOPS/W in FP-MAC operations involving 64 accumulations of BF16-input, BF16-weight, and FP32-output.\",\"PeriodicalId\":13129,\"journal\":{\"name\":\"IEEE Journal of Solid-state Circuits\",\"volume\":\"60 1\",\"pages\":\"158-170\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Solid-state Circuits\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10716755/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10716755/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
An Integer-Floating-Point Dual-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Chips
This article presents a novel integer-floating-point (INT-FP) gain-cell (GC)-computing-in-memory (CIM) structure for high-precision multiply-and-accumulate (MAC) operations with high computational flexibility, energy efficiency, and inference accuracy. The proposed device employs: 1) a dual-mode zone-based input processing scheme (ZB-IPS) aimed at eliminating exponent subtraction in order to enhance energy and area efficiency (AEF); 2) a dual-mode local computing cell (DM-LCC) to reuse exponent addition as an adder tree stage for INT-MAC to enhance AEF in both INT and floating-point (FP) modes; and 3) a stationary-based two-port GC array (SB-TP-GCA) to enable concurrent data updates and computation while reducing system-to-CIM and internal data accesses to improve energy efficiency. A 16-nm FinFET 108-kb GC-CIM macro fabricated using 4T gain cells (GCs) achieved energy efficiency of 99.5 TOPS/W in INT-MAC operations involving 128 accumulations of 8b-input, 8b-weight, and 23b-output; and 46.4 TFLOPS/W in FP-MAC operations involving 64 accumulations of BF16-input, BF16-weight, and FP32-output.
期刊介绍:
The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.