Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-03-01 DOI:10.1109/AICAS.2019.8771540

Hao Zhang, Jiongrui He, S. Ko

{"title":"Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks","authors":"Hao Zhang, Jiongrui He, S. Ko","doi":"10.1109/AICAS.2019.8771540","DOIUrl":null,"url":null,"abstract":"In recent years, many deep neural network accelerator architectures are proposed to improve the performance of processing deep neural network models. However, memory bandwidth is still the major issue and performance bottleneck of the deep neural network accelerators. The emerging 3D memory, such as hybrid memory cube (HMC) and processing-in-memory techniques provide new solutions to deep neural network implementation. In this paper, a novel HMC architecture is proposed for weight-sharing deep convolutional neural networks in order to solve the memory bandwidth bottleneck during the neural network implementation. The proposed HMC is designed based on conventional HMC architecture with only minor changes. In the logic layer, the vault controller is modified to enable parallel vault access. The weight parameters of pre-trained convolutional neural network are quantized to 16 numbers. During processing, the accumulation of the activations with shared weights is performed and only the accumulated results are transferred to the processing elements to perform multiplications with weights. By using this proposed architecture, the data transfer between main memory and processing elements can be reduced and the throughout of convolution operations can be improved by 30% compared to using HMC based multiply-accumulate design.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS.2019.8771540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In recent years, many deep neural network accelerator architectures are proposed to improve the performance of processing deep neural network models. However, memory bandwidth is still the major issue and performance bottleneck of the deep neural network accelerators. The emerging 3D memory, such as hybrid memory cube (HMC) and processing-in-memory techniques provide new solutions to deep neural network implementation. In this paper, a novel HMC architecture is proposed for weight-sharing deep convolutional neural networks in order to solve the memory bandwidth bottleneck during the neural network implementation. The proposed HMC is designed based on conventional HMC architecture with only minor changes. In the logic layer, the vault controller is modified to enable parallel vault access. The weight parameters of pre-trained convolutional neural network are quantized to 16 numbers. During processing, the accumulation of the activations with shared weights is performed and only the accumulated results are transferred to the processing elements to perform multiplications with weights. By using this proposed architecture, the data transfer between main memory and processing elements can be reduced and the throughout of convolution operations can be improved by 30% compared to using HMC based multiply-accumulate design.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于权重共享深度卷积神经网络的改进混合记忆立方体

近年来，人们提出了许多深度神经网络加速器架构来提高处理深度神经网络模型的性能。然而，内存带宽仍然是深度神经网络加速器的主要问题和性能瓶颈。新兴的3D存储器，如混合存储器立方体(HMC)和内存处理技术，为深度神经网络的实现提供了新的解决方案。本文提出了一种新的权重共享深度卷积神经网络HMC架构，以解决神经网络实现过程中的内存带宽瓶颈问题。所提出的HMC是在传统HMC架构的基础上设计的，只有很小的变化。在逻辑层，修改保险库控制器以启用并行保险库访问。将预训练卷积神经网络的权值参数量化为16个数。在处理期间，执行具有共享权重的激活的累积，并且只有累积的结果被传输到处理元素以执行具有权重的乘法。与基于HMC的多重累加设计相比，采用该架构可减少主存与处理单元之间的数据传输，并将卷积运算的总次数提高30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量

期刊最新文献

Artificial Intelligence of Things Wearable System for Cardiac Disease Detection Fast event-driven incremental learning of hand symbols Accelerating CNN-RNN Based Machine Health Monitoring on FPGA Neuromorphic networks on the SpiNNaker platform Complexity Reduction on HEVC Intra Mode Decision with modified LeNet-5