A 8.93-TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity With All Parameters Stored On-Chip

ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC) Pub Date : 2019-09-01 DOI:10.1109/ESSCIRC.2019.8902809

Deepak Kadetotad, Visar Berisha, C. Chakrabarti, Jae-sun Seo

引用次数: 2

Abstract

Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种8.93-TOPS/W的分层粗粒稀疏LSTM递归神经网络加速器

长短期记忆(LSTM)网络在语音应用中得到了广泛的应用，但由于其巨大的重量存储要求，使得其难以在硬件上有效实现。我们提出了一种节能的LSTM递归神经网络(RNN)加速器，它采用了一种称为分层粗粒稀疏(HCGS)的算法-硬件协同优化的内存压缩技术。在基于hcgs的块递归权重压缩的帮助下，我们展示了LSTM网络的权重减少了16倍，同时实现了最小的精度损失。在65nm LP CMOS中制造的原型芯片对于使用HCGS训练的2 /3层LSTM rnn在TIMIT/TED-LIUM语料库中实现了8.93/7.22 TOPS/W。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC)

自引率

0.00%

发文量