Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-02-01 DOI:10.1109/HPCA.2019.00029

Xiaowei Wang, Jiecao Yu, C. Augustine, R. Iyer, R. Das

引用次数: 25

Abstract

We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks an in-SRAM architecture for accelerating Convolutional Neural Network (CNN) inference by leveraging network redundancy and massive parallelism. The network redundancy is exploited in two ways. First, we prune and fine-tune the trained network model and develop two distinct methods coalescing and overlapping to run inferences efficiently with sparse models. Second, we propose an architecture for network models with a reduced bit width by leveraging bit-serial computation. Our proposed architecture achieves a 17.7×/3.7× speedup over server class CPU/GPU, and a 1.6× speedup compared to the relevant in-cache accelerator, with 2% area overhead each processor die, and no loss on top-1 accuracy for AlexNet. With a relaxed accuracy limit, our tunable architecture achieves higher speedups. Keywords-In-Memory Computing; Cache; Neural Network Pruning; Low Precision Neural Network.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度卷积神经网络的位审慎缓存加速

我们提出了位谨慎的深度卷积神经网络缓存内加速，通过利用网络冗余和大规模并行性来加速卷积神经网络(CNN)推理的sram内架构。网络冗余有两种利用方式。首先，我们对训练好的网络模型进行了修剪和微调，并开发了两种不同的方法合并和重叠，以有效地运行稀疏模型的推理。其次，我们提出了一种利用位串行计算减少位宽度的网络模型架构。与服务器级CPU/GPU相比，我们提出的架构实现了17.7倍/3.7倍的加速，与相关的缓存内加速器相比，实现了1.6倍的加速，每个处理器芯片的面积开销为2%，并且AlexNet的top-1精度没有损失。通过放宽精度限制，我们的可调架构实现更高的速度。Keywords-In-Memory计算;缓存;神经网络修剪;低精度神经网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量

期刊最新文献

Machine Learning at Facebook: Understanding Inference at the Edge Understanding the Future of Energy Efficiency in Multi-Module GPUs POWERT Channels: A Novel Class of Covert CommunicationExploiting Power Management Vulnerabilities The Accelerator Wall: Limits of Chip Specialization Featherlight Reuse-Distance Measurement