DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2018-02-15 DOI:10.1145/3174243.3174261

Chang Gao, Daniel Neil, Enea Ceolini, Shih-Chii Liu, T. Delbrück

{"title":"DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator","authors":"Chang Gao, Daniel Neil, Enea Ceolini, Shih-Chii Liu, T. Delbrück","doi":"10.1145/3174243.3174261","DOIUrl":null,"url":null,"abstract":"Recurrent Neural Networks (RNNs) are widely used in speech recognition and natural language processing applications because of their capability to process temporal sequences. Because RNNs are fully connected, they require a large number of weight memory accesses, leading to high power consumption. Recent theory has shown that an RNN delta network update approach can reduce memory access and computes with negligible accuracy loss. This paper describes the implementation of this theoretical approach in a hardware accelerator called \"DeltaRNN\" (DRNN). The DRNN updates the output of a neuron only when the neuron»s activation changes by more than a delta threshold. It was implemented on a Xilinx Zynq-7100 FPGA. FPGA measurement results from a single-layer RNN of 256 Gated Recurrent Unit (GRU) neurons show that the DRNN achieves 1.2 TOp/s effective throughput and 164 GOp/s/W power efficiency. The delta update leads to a 5.7x speedup compared to a conventional RNN update because of the sparsity created by the DN algorithm and the zero-skipping ability of DRNN.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"147 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"108","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 108

Abstract

Recurrent Neural Networks (RNNs) are widely used in speech recognition and natural language processing applications because of their capability to process temporal sequences. Because RNNs are fully connected, they require a large number of weight memory accesses, leading to high power consumption. Recent theory has shown that an RNN delta network update approach can reduce memory access and computes with negligible accuracy loss. This paper describes the implementation of this theoretical approach in a hardware accelerator called "DeltaRNN" (DRNN). The DRNN updates the output of a neuron only when the neuron»s activation changes by more than a delta threshold. It was implemented on a Xilinx Zynq-7100 FPGA. FPGA measurement results from a single-layer RNN of 256 Gated Recurrent Unit (GRU) neurons show that the DRNN achieves 1.2 TOp/s effective throughput and 164 GOp/s/W power efficiency. The delta update leads to a 5.7x speedup compared to a conventional RNN update because of the sparsity created by the DN algorithm and the zero-skipping ability of DRNN.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DeltaRNN:一种高效的循环神经网络加速器

递归神经网络(RNNs)因其处理时间序列的能力而广泛应用于语音识别和自然语言处理领域。由于rnn是完全连接的，因此需要大量的权重内存访问，从而导致高功耗。最近的理论表明，RNN增量网络更新方法可以减少内存访问，计算精度损失可以忽略不计。本文描述了这一理论方法在一个名为“DeltaRNN”(DRNN)的硬件加速器中的实现。只有当神经元的激活变化超过一个增量阈值时，DRNN才会更新神经元的输出。它是在Xilinx Zynq-7100 FPGA上实现的。对256个门控循环单元(GRU)神经元的单层RNN的FPGA测量结果表明，该RNN达到了1.2 TOp/s的有效吞吐量和164 GOp/s/W的功率效率。与传统的RNN更新相比，增量更新导致5.7倍的加速，因为DN算法创建的稀疏性和DRNN的跳零能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量

期刊最新文献

Architecture and Circuit Design of an All-Spintronic FPGA Session details: Session 6: High Level Synthesis 2 A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only) Software/Hardware Co-design for Multichannel Scheduling in IEEE 802.11p MLME: (Abstract Only) Session details: Special Session: Deep Learning