深度神经网络高效硬件实现的量化导向剪枝

2020 18th IEEE International New Circuits and Systems Conference (NEWCAS) Pub Date : 2020-06-01 DOI:10.1109/newcas49341.2020.9159769

G. B. Hacene, Vincent Gripon, M. Arzel, Nicolas Farrugia, Y. Bengio

{"title":"深度神经网络高效硬件实现的量化导向剪枝","authors":"G. B. Hacene, Vincent Gripon, M. Arzel, Nicolas Farrugia, Y. Bengio","doi":"10.1109/newcas49341.2020.9159769","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) in general and Convolutional Neural Networks (CNNs) in particular are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, by replacing the complex convolutional operation by a low-cost multiplexer. We perform experiments on CIFAR10, CIFAR100 and SVHN datasets and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints compared to the baselines. We also propose an efficient hardware architecture, implemented on Field Programmable Gate Arrays (FPGAs), to accelerate inference, which works as a pipeline and accommodates multiple layers working at the same time to speed up the inference process. In contrast with most proposed approaches which have used external memory or software defined memory controllers, our work is based on algorithmic optimization and full-hardware design, enabling a direct, on-chip memory implementation of a DNN while keeping close to state of the art accuracy.","PeriodicalId":135163,"journal":{"name":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","volume":"8 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Quantized Guided Pruning for Efficient Hardware Implementations of Deep Neural Networks\",\"authors\":\"G. B. Hacene, Vincent Gripon, M. Arzel, Nicolas Farrugia, Y. Bengio\",\"doi\":\"10.1109/newcas49341.2020.9159769\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Neural Networks (DNNs) in general and Convolutional Neural Networks (CNNs) in particular are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, by replacing the complex convolutional operation by a low-cost multiplexer. We perform experiments on CIFAR10, CIFAR100 and SVHN datasets and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints compared to the baselines. We also propose an efficient hardware architecture, implemented on Field Programmable Gate Arrays (FPGAs), to accelerate inference, which works as a pipeline and accommodates multiple layers working at the same time to speed up the inference process. In contrast with most proposed approaches which have used external memory or software defined memory controllers, our work is based on algorithmic optimization and full-hardware design, enabling a direct, on-chip memory implementation of a DNN while keeping close to state of the art accuracy.\",\"PeriodicalId\":135163,\"journal\":{\"name\":\"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"volume\":\"8 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/newcas49341.2020.9159769\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/newcas49341.2020.9159769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

一般来说，深度神经网络(dnn)，尤其是卷积神经网络(cnn)，在许多计算机视觉任务中都是最先进的，比如物体分类和检测。然而，它们包含的大量参数导致高计算复杂性，并严重限制了它们在预算受限的设备(如嵌入式设备)中的可用性。在本文中，我们提出了一种修剪技术和量化方案的组合，通过用低成本的多路复用器代替复杂的卷积运算，有效地降低了cnn卷积层的复杂性和内存使用。我们在CIFAR10、CIFAR100和SVHN数据集上进行了实验，结果表明，与基线相比，所提出的方法几乎达到了最先进的精度，同时大大减少了计算和内存占用。我们还提出了一种高效的硬件架构，实现在现场可编程门阵列(fpga)上，以加速推理，它像管道一样工作，并容纳多层同时工作，以加快推理过程。与大多数使用外部存储器或软件定义存储器控制器的建议方法相比，我们的工作基于算法优化和全硬件设计，在保持接近最先进精度的同时，实现DNN的直接片上存储器实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Quantized Guided Pruning for Efficient Hardware Implementations of Deep Neural Networks

Deep Neural Networks (DNNs) in general and Convolutional Neural Networks (CNNs) in particular are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, by replacing the complex convolutional operation by a low-cost multiplexer. We perform experiments on CIFAR10, CIFAR100 and SVHN datasets and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints compared to the baselines. We also propose an efficient hardware architecture, implemented on Field Programmable Gate Arrays (FPGAs), to accelerate inference, which works as a pipeline and accommodates multiple layers working at the same time to speed up the inference process. In contrast with most proposed approaches which have used external memory or software defined memory controllers, our work is based on algorithmic optimization and full-hardware design, enabling a direct, on-chip memory implementation of a DNN while keeping close to state of the art accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)

自引率

0.00%

发文量