Energy-efficient neural network design using memristive MAC unit

IF 2.1 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Frontiers in electronics Pub Date : 2022-09-26 DOI:10.3389/felec.2022.877629

Shengqi Yu, Thanasin Bunnam, S. Triamlumlerd, Manoch Pracha, F. Xia, R. Shafik, A. Yakovlev

{"title":"Energy-efficient neural network design using memristive MAC unit","authors":"Shengqi Yu, Thanasin Bunnam, S. Triamlumlerd, Manoch Pracha, F. Xia, R. Shafik, A. Yakovlev","doi":"10.3389/felec.2022.877629","DOIUrl":null,"url":null,"abstract":"Artificial intelligence applications implemented with neural networks require extensive arithmetic capabilities through multiply-accumulate (MAC) units. Traditional designs based on voltage-mode circuits feature complex logic chains for such purposes as carry processing. Additionally, as a separate memory block is used (e.g., in a von Neumann architecture), data movements incur on-chip communication bottlenecks. Furthermore, conventional multipliers have both operands encoded in the same physical quantity, which is either low cost to update or low cost to hold, but not both. This may be significant for low-energy edge operations. In this paper, we propose and present a mixed-signal multiply-accumulate unit design with in-memory computing to improve both latency and energy. This design is based on a single-bit multiplication cell consisting of a number of memristors and a single transistor switch (1TxM), arranged in a crossbar structure implementing the long-multiplication algorithm. The key innovation is that one of the operands is encoded in easy to update voltage and the other is encoded in non-volatile memristor conductance. This targets operations such as machine learning which feature asymmetric requirements for operand updates. Ohm’s Law and KCL take care of the multiplication in analog. When implemented as part of a NN, the MAC unit incorporates a current to digital stage to produce multi-bit voltage-mode output, in the same format as the input. The computation latency consists of memory writing and result encoding operations, with the Ohm’s Law and KCL operations contributing negligible delay. When compared with other memristor-based multipliers, the proposed work shows an order of magnitude of latency improvement in 4-bit implementations partly because of the Ohm’s Law and KCL time savings and partly because of the short writing operations for the frequently updated operand represented by voltages. In addition, the energy consumption per multiplication cycle of the proposed work is shown to improve by 74%–99% in corner cases. To investigate the usefulness of this MAC design in machine learning applications, its input/output relationships is characterized using multi-layer perceptrons to classify the well-known hand-writing digit dataset MNIST. This case study implements a quantization-aware training and includes the non-ideal effect of our MAC unit to allow the NN to learn and preserve its high accuracy. The simulation results show the NN using the proposed MAC unit yields an accuracy of 93%, which is only 1% lower than its baseline.","PeriodicalId":73081,"journal":{"name":"Frontiers in electronics","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/felec.2022.877629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

Abstract

Artificial intelligence applications implemented with neural networks require extensive arithmetic capabilities through multiply-accumulate (MAC) units. Traditional designs based on voltage-mode circuits feature complex logic chains for such purposes as carry processing. Additionally, as a separate memory block is used (e.g., in a von Neumann architecture), data movements incur on-chip communication bottlenecks. Furthermore, conventional multipliers have both operands encoded in the same physical quantity, which is either low cost to update or low cost to hold, but not both. This may be significant for low-energy edge operations. In this paper, we propose and present a mixed-signal multiply-accumulate unit design with in-memory computing to improve both latency and energy. This design is based on a single-bit multiplication cell consisting of a number of memristors and a single transistor switch (1TxM), arranged in a crossbar structure implementing the long-multiplication algorithm. The key innovation is that one of the operands is encoded in easy to update voltage and the other is encoded in non-volatile memristor conductance. This targets operations such as machine learning which feature asymmetric requirements for operand updates. Ohm’s Law and KCL take care of the multiplication in analog. When implemented as part of a NN, the MAC unit incorporates a current to digital stage to produce multi-bit voltage-mode output, in the same format as the input. The computation latency consists of memory writing and result encoding operations, with the Ohm’s Law and KCL operations contributing negligible delay. When compared with other memristor-based multipliers, the proposed work shows an order of magnitude of latency improvement in 4-bit implementations partly because of the Ohm’s Law and KCL time savings and partly because of the short writing operations for the frequently updated operand represented by voltages. In addition, the energy consumption per multiplication cycle of the proposed work is shown to improve by 74%–99% in corner cases. To investigate the usefulness of this MAC design in machine learning applications, its input/output relationships is characterized using multi-layer perceptrons to classify the well-known hand-writing digit dataset MNIST. This case study implements a quantization-aware training and includes the non-ideal effect of our MAC unit to allow the NN to learn and preserve its high accuracy. The simulation results show the NN using the proposed MAC unit yields an accuracy of 93%, which is only 1% lower than its baseline.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

忆阻MAC单元节能神经网络设计

用神经网络实现的人工智能应用需要通过乘法累加（MAC）单元来实现广泛的算术能力。基于电压模式电路的传统设计具有用于进位处理等目的的复杂逻辑链。此外，由于使用了单独的存储块（例如，在冯·诺依曼体系结构中），数据移动会导致片上通信瓶颈。此外，传统乘法器具有以相同物理量编码的两个操作数，这或者是更新的低成本或者是保持的低成本，但不是两者都是。这对于低能量边缘操作可能是重要的。在本文中，我们提出并提出了一种具有内存计算的混合信号乘法累加单元设计，以提高延迟和能量。该设计基于由多个忆阻器和单个晶体管开关（1TxM）组成的单比特乘法单元，该单元布置在实现长乘法算法的纵横结构中。关键的创新是，其中一个操作数以易于更新的电压编码，另一个以非易失性忆阻器电导编码。这针对的是诸如机器学习之类的操作，这些操作具有对操作数更新的不对称要求。欧姆定律和KCL处理模拟中的乘法运算。当作为NN的一部分实现时，MAC单元结合了一个电流到数字级，以产生与输入相同格式的多位电压模式输出。计算延迟由内存写入和结果编码操作组成，欧姆定律和KCL操作造成的延迟可以忽略不计。与其他基于忆阻器的乘法器相比，所提出的工作在4位实现中显示了延迟改进的数量级，部分原因是欧姆定律和KCL时间节省，部分原因在于对由电压表示的频繁更新的操作数的短写入操作。此外，在拐角情况下，所提出的工作的每个乘法循环的能耗提高了74%-99%。为了研究这种MAC设计在机器学习应用中的有用性，使用多层感知器对众所周知的手写数字数据集MNIST进行分类，来表征其输入/输出关系。该案例研究实现了量化感知训练，并包括我们的MAC单元的非理想效果，以允许NN学习并保持其高精度。仿真结果表明，使用所提出的MAC单元的神经网络的准确率为93%，仅比其基线低1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in electronics

自引率

0.00%

发文量