A 28-nm 16-kb Aggregation and Combination Computing-in-Memory Macro With Dual-Level Sparsity Modulation and Sparse-Tracking ADCs for GCNs

IF 5.6 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Solid-state Circuits Pub Date : 2024-10-16 DOI:10.1109/JSSC.2024.3472115

Zhaoyang Zhang;Yanqi Zhang;Feiran Liu;Zhichao Liu;Yinhai Gao;Yuchen Ma;Yutong Zhang;An Guo;Tianzhu Xiong;Jinwu Chen;Xi Chen;Bo Wang;Yuchen Tang;Jun Yang;Xin Si

{"title":"A 28-nm 16-kb Aggregation and Combination Computing-in-Memory Macro With Dual-Level Sparsity Modulation and Sparse-Tracking ADCs for GCNs","authors":"Zhaoyang Zhang;Yanqi Zhang;Feiran Liu;Zhichao Liu;Yinhai Gao;Yuchen Ma;Yutong Zhang;An Guo;Tianzhu Xiong;Jinwu Chen;Xi Chen;Bo Wang;Yuchen Tang;Jun Yang;Xin Si","doi":"10.1109/JSSC.2024.3472115","DOIUrl":null,"url":null,"abstract":"Computing-in-memory (CIM) architectures have demonstrated remarkable potential in addressing the memory wall. However, previous CIMs were often designed for multiply-accumulate (MAC) operations, which presents a myriad of challenges when deploying graph convolutional networks (GCNs) on CIM macros. This work presents a compact 6T SRAM-based aggregation and combination CIM macro (ACCIM) using: 1) a cell array with compact 6T bitcells, local jump and computing cells (LJCCs), and sparse-tracking analog-to-digital converters (STADCs) to improve energy efficiency in both input sparsity and weight bit sparsity; 2) an LJCC and STADC to improve the signal margin; 3) a CIM macro architecture with an aggregation input unit (AGINU) to support both GCN aggregation and GCN combination/convolutional neural network (CNN) MAC; 4) a graph pruning algorithm that divides the graph data into memory-friendly subgraphs; and 5) an error modeling-based pre-training method to improve the inference accuracy. A fabricated 28-nm 16-kb charge-domain SRAM-CIM macro achieved an energy efficiency of 86.87 TOPS/W and an area efficiency of 2344 GOPS/mm2 for GCNs with 4-bit degree input, 8-bit features, and 15-bit output.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 3","pages":"949-962"},"PeriodicalIF":5.6000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10720191/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Computing-in-memory (CIM) architectures have demonstrated remarkable potential in addressing the memory wall. However, previous CIMs were often designed for multiply-accumulate (MAC) operations, which presents a myriad of challenges when deploying graph convolutional networks (GCNs) on CIM macros. This work presents a compact 6T SRAM-based aggregation and combination CIM macro (ACCIM) using: 1) a cell array with compact 6T bitcells, local jump and computing cells (LJCCs), and sparse-tracking analog-to-digital converters (STADCs) to improve energy efficiency in both input sparsity and weight bit sparsity; 2) an LJCC and STADC to improve the signal margin; 3) a CIM macro architecture with an aggregation input unit (AGINU) to support both GCN aggregation and GCN combination/convolutional neural network (CNN) MAC; 4) a graph pruning algorithm that divides the graph data into memory-friendly subgraphs; and 5) an error modeling-based pre-training method to improve the inference accuracy. A fabricated 28-nm 16-kb charge-domain SRAM-CIM macro achieved an energy efficiency of 86.87 TOPS/W and an area efficiency of 2344 GOPS/mm2 for GCNs with 4-bit degree input, 8-bit features, and 15-bit output.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

采用双级稀疏性调制和稀疏跟踪 ADC 的 28 纳米 16-kb 内存中聚合与组合计算宏程序，适用于 GCN

内存计算（CIM）体系结构在解决内存墙方面显示出了巨大的潜力。然而，以前的CIM通常是为乘法累积（MAC）操作设计的，这在CIM宏上部署图卷积网络（GCNs）时提出了无数的挑战。这项工作提出了一个紧凑的基于6T sram的聚合和组合CIM宏（ACCIM），使用：1)一个具有紧凑6T位单元、本地跳跃和计算单元（ljcc）和稀疏跟踪模数转换器（STADCs）的单元阵列，以提高输入稀疏性和权重位稀疏性的能源效率；2)采用LJCC和STADC提高信号裕度；3)具有汇聚输入单元（AGINU）的CIM宏架构，支持GCN汇聚和GCN组合/卷积神经网络（CNN） MAC；4)将图数据划分为内存友好子图的图修剪算法；5)基于误差建模的预训练方法，提高推理精度。对于具有4位度输入、8位特征和15位输出的GCNs，制备的28纳米16 kb电荷域SRAM-CIM宏的能量效率为86.87 TOPS/W，面积效率为2344 GOPS/mm2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Journal of Solid-state Circuits 工程技术-工程：电子与电气

CiteScore

11.00

自引率

20.40%

发文量

351

审稿时长

3-6 weeks

期刊介绍： The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.