Double MAC on a Cell: A 22-nm 8T-SRAM-Based Analog In-Memory Accelerator for Binary/Ternary Neural Networks Featuring Split Wordline

IF 2.4 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE open journal of circuits and systems Pub Date : 2024-10-17 DOI:10.1109/OJCAS.2024.3482469

Hiroto Tagata;Takashi Sato;Hiromitsu Awano

{"title":"Double MAC on a Cell: A 22-nm 8T-SRAM-Based Analog In-Memory Accelerator for Binary/Ternary Neural Networks Featuring Split Wordline","authors":"Hiroto Tagata;Takashi Sato;Hiromitsu Awano","doi":"10.1109/OJCAS.2024.3482469","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel 8T-SRAM based computing-in-memory (CIM) accelerator for the Binary/Ternary neural networks. The proposed split dual-port 8T-SRAM cell has two input ports, simultaneously performing two binary multiply-and-accumulate (MAC) operations on left and right bitlines. This approach enables a twofold increase in throughput without significantly increasing area or power consumption, since the area overhead for doubling throughput is only two additional WL wires compared to the conventional 8T-SRAM. In addition, the proposed circuit supports binary and ternary activation input, allowing flexible adjustment of high energy efficiency and high inference accuracy depending on the application. The proposed SRAM macro consists of a \n<inline-formula> <tex-math>$128 \\times 128$ </tex-math></inline-formula>\n SRAM array that outputs the MAC operation results of 96 binary/ternary inputs and \n<inline-formula> <tex-math>$96 \\times 128$ </tex-math></inline-formula>\n binary weights as 1-5 bit digital values. The proposed circuit performance was evaluated by post-layout simulation with the 22-nm process layout of the overall CIM macro. The proposed circuit is capable of high-speed operation at 1 GHz. It achieves a maximum area efficiency of 3320 TOPS/mm2, which is \n<inline-formula> <tex-math>$3.4 \\times $ </tex-math></inline-formula>\n higher compared to existing research with a reasonable energy efficiency of 1471 TOPS/W. The simulated inference accuracies of the proposed circuit are 96.45%/97.67% for MNIST dataset with binary/ternary MLP model, and 86.32%/88.56% for CIFAR-10 dataset with binary/ternary VGG-like CNN model.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"328-340"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10721281","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10721281/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes a novel 8T-SRAM based computing-in-memory (CIM) accelerator for the Binary/Ternary neural networks. The proposed split dual-port 8T-SRAM cell has two input ports, simultaneously performing two binary multiply-and-accumulate (MAC) operations on left and right bitlines. This approach enables a twofold increase in throughput without significantly increasing area or power consumption, since the area overhead for doubling throughput is only two additional WL wires compared to the conventional 8T-SRAM. In addition, the proposed circuit supports binary and ternary activation input, allowing flexible adjustment of high energy efficiency and high inference accuracy depending on the application. The proposed SRAM macro consists of a

$128 \times 128$

SRAM array that outputs the MAC operation results of 96 binary/ternary inputs and

$96 \times 128$

binary weights as 1-5 bit digital values. The proposed circuit performance was evaluated by post-layout simulation with the 22-nm process layout of the overall CIM macro. The proposed circuit is capable of high-speed operation at 1 GHz. It achieves a maximum area efficiency of 3320 TOPS/mm2, which is

$3.4 \times $

higher compared to existing research with a reasonable energy efficiency of 1471 TOPS/W. The simulated inference accuracies of the proposed circuit are 96.45%/97.67% for MNIST dataset with binary/ternary MLP model, and 86.32%/88.56% for CIFAR-10 dataset with binary/ternary VGG-like CNN model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

单元上的双 MAC：基于 22 纳米 8T-SRAM 的模拟内存加速器，用于二元/三元神经网络，具有分割字线功能

本文为二元/三元神经网络提出了一种基于 8T-SRAM 的新型内存计算（CIM）加速器。所提出的分离式双端口 8T-SRAM 单元有两个输入端口，可同时在左右位线上执行两个二进制乘法累加 (MAC) 运算。与传统的 8T-SRAM 相比，增加一倍吞吐量所需的面积开销仅为两条额外的 WL 线，因此这种方法能在不显著增加面积或功耗的情况下将吞吐量提高两倍。此外，所提出的电路支持二元和三元激活输入，可根据应用灵活调整高能效和高推理精度。拟议的 SRAM 宏由一个 128 美元的 SRAM 阵列组成，可将 96 个二进制/三进制输入的 MAC 运算结果和 96 个 128 美元的二进制权重输出为 1-5 位数字值。通过对整个 CIM 宏的 22 纳米工艺布局进行布局后仿真，对所提出的电路性能进行了评估。所提出的电路能够以 1 GHz 的频率高速运行。它实现了 3320 TOPS/mm2 的最大面积效率，与现有研究相比提高了 3.4 倍，合理能效为 1471 TOPS/W。在二元/三元 MLP 模型的 MNIST 数据集和二元/三元 VGG-like CNN 模型的 CIFAR-10 数据集上，所提电路的模拟推理准确率分别为 96.45%/97.67% 和 86.32%/88.56% 。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE open journal of circuits and systems

自引率

0.00%

发文量

审稿时长

19 weeks