{"title":"Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks","authors":"Hao Zhang, Jiongrui He, S. Ko","doi":"10.1109/AICAS.2019.8771540","DOIUrl":null,"url":null,"abstract":"In recent years, many deep neural network accelerator architectures are proposed to improve the performance of processing deep neural network models. However, memory bandwidth is still the major issue and performance bottleneck of the deep neural network accelerators. The emerging 3D memory, such as hybrid memory cube (HMC) and processing-in-memory techniques provide new solutions to deep neural network implementation. In this paper, a novel HMC architecture is proposed for weight-sharing deep convolutional neural networks in order to solve the memory bandwidth bottleneck during the neural network implementation. The proposed HMC is designed based on conventional HMC architecture with only minor changes. In the logic layer, the vault controller is modified to enable parallel vault access. The weight parameters of pre-trained convolutional neural network are quantized to 16 numbers. During processing, the accumulation of the activations with shared weights is performed and only the accumulated results are transferred to the processing elements to perform multiplications with weights. By using this proposed architecture, the data transfer between main memory and processing elements can be reduced and the throughout of convolution operations can be improved by 30% compared to using HMC based multiply-accumulate design.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS.2019.8771540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In recent years, many deep neural network accelerator architectures are proposed to improve the performance of processing deep neural network models. However, memory bandwidth is still the major issue and performance bottleneck of the deep neural network accelerators. The emerging 3D memory, such as hybrid memory cube (HMC) and processing-in-memory techniques provide new solutions to deep neural network implementation. In this paper, a novel HMC architecture is proposed for weight-sharing deep convolutional neural networks in order to solve the memory bandwidth bottleneck during the neural network implementation. The proposed HMC is designed based on conventional HMC architecture with only minor changes. In the logic layer, the vault controller is modified to enable parallel vault access. The weight parameters of pre-trained convolutional neural network are quantized to 16 numbers. During processing, the accumulation of the activations with shared weights is performed and only the accumulated results are transferred to the processing elements to perform multiplications with weights. By using this proposed architecture, the data transfer between main memory and processing elements can be reduced and the throughout of convolution operations can be improved by 30% compared to using HMC based multiply-accumulate design.