一种面向边缘智能的节能时复用内存计算架构

IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-09-15 DOI:10.1109/JXCDC.2022.3206879

Rui Xiao;Wenyu Jiang;Piew Yoong Chee

{"title":"一种面向边缘智能的节能时复用内存计算架构","authors":"Rui Xiao;Wenyu Jiang;Piew Yoong Chee","doi":"10.1109/JXCDC.2022.3206879","DOIUrl":null,"url":null,"abstract":"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \n<inline-formula> <tex-math>$18.4\\times $ </tex-math></inline-formula>\n in energy with 0.136 pJ/MAC efficiency, and \n<inline-formula> <tex-math>$19.9\\times $ </tex-math></inline-formula>\n area for 1T1R case and \n<inline-formula> <tex-math>$15.9\\times $ </tex-math></inline-formula>\n for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \n<inline-formula> <tex-math>$16\\times $ </tex-math></inline-formula>\n area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"111-118"},"PeriodicalIF":2.0000,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf","citationCount":"1","resultStr":"{\"title\":\"An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence\",\"authors\":\"Rui Xiao;Wenyu Jiang;Piew Yoong Chee\",\"doi\":\"10.1109/JXCDC.2022.3206879\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \\n<inline-formula> <tex-math>$18.4\\\\times $ </tex-math></inline-formula>\\n in energy with 0.136 pJ/MAC efficiency, and \\n<inline-formula> <tex-math>$19.9\\\\times $ </tex-math></inline-formula>\\n area for 1T1R case and \\n<inline-formula> <tex-math>$15.9\\\\times $ </tex-math></inline-formula>\\n for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \\n<inline-formula> <tex-math>$16\\\\times $ </tex-math></inline-formula>\\n area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.\",\"PeriodicalId\":54149,\"journal\":{\"name\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"volume\":\"8 2\",\"pages\":\"111-118\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9893208/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9893208/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 1

摘要

深度神经网络(dnn)不断增长的数据量和复杂性需要新的架构来超越冯-诺伊曼瓶颈的限制，内存计算(CIM)是实现节能神经网络的一个有前途的方向。然而，CIM的外围传感电路通常是耗电和面积大的组件。我们提出了一种基于忆性模拟计算的时间复用CIM架构(TM-CIM)，以实现外围电路的共享和一次处理一列。忆阻器阵列以列方式排列，避免在未选择的列上浪费功率/能量。此外，数模转换器(DAC)的功率和能源效率(比模数转换器(ADC)的开销更大)可以在TM-CIM中进行微调，以获得显著改进。对于典型设置的256*256横条阵列，TM-CIM以0.136 pJ/MAC效率节省18.4美元的能源，1T1R机箱节省19.9美元的面积，2T2R机箱节省15.9美元的面积。对VGG-16的性能评估表明，TM-CIM可以节省超过16美元的面积。在芯片面积、峰值功率和延迟之间进行了权衡，提出了一种在不显著增加芯片面积和峰值功率的情况下进一步降低VGG-16上延迟的方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence

The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves

$18.4\times $

in energy with 0.136 pJ/MAC efficiency, and

$19.9\times $

area for 1T1R case and

$15.9\times $

for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over

$16\times $

area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助