Straightforward data transfer in a blockwise dataflow for an analog RRAM-based CIM system

IF 1.9 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Frontiers in electronics Pub Date : 2023-04-17 DOI:10.3389/felec.2023.1129675

Yuyi Liu, B. Gao, Peng Yao, Qi Liu, Qingtian Zhang, Dong Wu, Jianshi Tang, H. Qian, Huaqiang Wu

{"title":"Straightforward data transfer in a blockwise dataflow for an analog RRAM-based CIM system","authors":"Yuyi Liu, B. Gao, Peng Yao, Qi Liu, Qingtian Zhang, Dong Wu, Jianshi Tang, H. Qian, Huaqiang Wu","doi":"10.3389/felec.2023.1129675","DOIUrl":null,"url":null,"abstract":"Analog resistive random-access memory (RRAM)-based computation-in-memory (CIM) technology is promising for constructing artificial intelligence (AI) with high energy efficiency and excellent scalability. However, the large overhead of analog-to-digital converters (ADCs) is a key limitation. In this work, we propose a novel LINKAGE architecture that eliminates PE-level ADCs and leverages an analog data transfer module to implement inter-array data processing. A blockwise dataflow is further proposed to accelerate convolutional neural networks (CNNs) to speed up compute-intensive layers and solve the unbalanced pipeline problem. To obtain accurate and reliable benchmark results, key component modules, such as straightforward link (SFL) modules and Tile-level ADCs, are designed in standard 28 nm CMOS technology. The evaluation shows that LINKAGE outperforms the conventional ADC/DAC-based architecture with a 2.07×∼11.22× improvement in throughput, 2.45×∼7.00× in energy efficiency, and 22%–51% reduction in the area overhead while maintaining accuracy. Our LINKAGE architecture can achieve 22.9∼24.4 TOPS/W energy efficiency (4b-IN/4b-W) and 1.82 ∼4.53 TOPS throughput with the blockwise method. This work demonstrates a new method for significantly improving the energy efficiency of CIM chips, which can be applied to general CNNs/FCNNs.","PeriodicalId":73081,"journal":{"name":"Frontiers in electronics","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/felec.2023.1129675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Analog resistive random-access memory (RRAM)-based computation-in-memory (CIM) technology is promising for constructing artificial intelligence (AI) with high energy efficiency and excellent scalability. However, the large overhead of analog-to-digital converters (ADCs) is a key limitation. In this work, we propose a novel LINKAGE architecture that eliminates PE-level ADCs and leverages an analog data transfer module to implement inter-array data processing. A blockwise dataflow is further proposed to accelerate convolutional neural networks (CNNs) to speed up compute-intensive layers and solve the unbalanced pipeline problem. To obtain accurate and reliable benchmark results, key component modules, such as straightforward link (SFL) modules and Tile-level ADCs, are designed in standard 28 nm CMOS technology. The evaluation shows that LINKAGE outperforms the conventional ADC/DAC-based architecture with a 2.07×∼11.22× improvement in throughput, 2.45×∼7.00× in energy efficiency, and 22%–51% reduction in the area overhead while maintaining accuracy. Our LINKAGE architecture can achieve 22.9∼24.4 TOPS/W energy efficiency (4b-IN/4b-W) and 1.82 ∼4.53 TOPS throughput with the blockwise method. This work demonstrates a new method for significantly improving the energy efficiency of CIM chips, which can be applied to general CNNs/FCNNs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于模拟RRAM的CIM系统的块数据流中的直接数据传输

基于模拟电阻随机存取存储器（RRAM）的存储器中计算（CIM）技术有望构建具有高能效和良好可扩展性的人工智能。然而，模数转换器（ADC）的大开销是一个关键限制。在这项工作中，我们提出了一种新的LINKAGE架构，该架构消除了PE级ADC，并利用模拟数据传输模块来实现阵列间数据处理。进一步提出了一种分块数据流来加速卷积神经网络（CNNs），以加速计算密集层并解决不平衡管道问题。为了获得准确可靠的基准测试结果，关键组件模块，如直接链路（SFL）模块和平铺级ADC，采用标准28 nm CMOS技术设计。评估表明，LINKAGE在保持精度的同时，吞吐量提高了2.07×～11.22倍，能效提高了2.45×～7.00倍，面积开销降低了22%～51%，优于传统的基于ADC/DAC的架构。我们的LINKAGE架构可以通过分块方法实现22.9～24.4 TOPS/W能效（4b IN/4b-W）和1.82～4.53 TOPS吞吐量。这项工作展示了一种显著提高CIM芯片能效的新方法，该方法可应用于通用的CNNs/FCNN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in electronics

自引率

0.00%

发文量