CNN hardware acceleration on a low-power and low-cost APSoC

2019 Conference on Design and Architectures for Signal and Image Processing (DASIP) Pub Date : 2019-10-01 DOI:10.1109/DASIP48288.2019.9049213

P. Meloni, Antonio Garufi, Gianfranco Deriu, Marco Carreras, Daniela Loi

{"title":"CNN hardware acceleration on a low-power and low-cost APSoC","authors":"P. Meloni, Antonio Garufi, Gianfranco Deriu, Marco Carreras, Daniela Loi","doi":"10.1109/DASIP48288.2019.9049213","DOIUrl":null,"url":null,"abstract":"Deep learning and Convolutional Neural Networks (CNNs) in particular, are currently one of the most promising and widely used classes of algorithms in the field of artificial intelligence, being employed in a wide range of tasks. However, their high computational complexity and storage demands limit their efficient deployment on resource-limited embedded systems and IoT devices. To address this problem, in recent years a wide landscape of customized FPGA-based hardware acceleration solutions has been presented in literature, focused on combining high performance and power efficiency. Most of them are implemented on mid- to high-range devices including different computing cores, and target intensive models such as AlexNet and VGG16. In this work, we implement a CNN inference accelerator on a compact and cost-optimized device, the Minized development board from Avnet, integrating a single-core Zynq 7Z007S. We measure the execution time and energy consumption of the developed accelerator, and we compare it with a CPU-based software implementation. The results show that the accelerator achieves a frame rate of 13 fps on the end-to-end execution of ALL-CNN-C model, and 4 fps on DarkNet. Compared with the software implementation, it was 5 times faster providing up to 10.62 giga operations per second (GOPS) at 80 MHz while consuming 1.08 W of on-chip power.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASIP48288.2019.9049213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Deep learning and Convolutional Neural Networks (CNNs) in particular, are currently one of the most promising and widely used classes of algorithms in the field of artificial intelligence, being employed in a wide range of tasks. However, their high computational complexity and storage demands limit their efficient deployment on resource-limited embedded systems and IoT devices. To address this problem, in recent years a wide landscape of customized FPGA-based hardware acceleration solutions has been presented in literature, focused on combining high performance and power efficiency. Most of them are implemented on mid- to high-range devices including different computing cores, and target intensive models such as AlexNet and VGG16. In this work, we implement a CNN inference accelerator on a compact and cost-optimized device, the Minized development board from Avnet, integrating a single-core Zynq 7Z007S. We measure the execution time and energy consumption of the developed accelerator, and we compare it with a CPU-based software implementation. The results show that the accelerator achieves a frame rate of 13 fps on the end-to-end execution of ALL-CNN-C model, and 4 fps on DarkNet. Compared with the software implementation, it was 5 times faster providing up to 10.62 giga operations per second (GOPS) at 80 MHz while consuming 1.08 W of on-chip power.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于低功耗低成本APSoC的CNN硬件加速

特别是深度学习和卷积神经网络(cnn)，是目前人工智能领域最有前途和最广泛使用的算法之一，被广泛应用于各种任务中。然而，它们的高计算复杂度和存储需求限制了它们在资源有限的嵌入式系统和物联网设备上的有效部署。为了解决这个问题，近年来在文献中提出了基于fpga的定制硬件加速解决方案，重点是结合高性能和功率效率。它们大多是在包括不同计算核心的中高档设备上实现的，目标是AlexNet和VGG16等密集型模型。在这项工作中，我们在安富利的小型化开发板上实现了一个CNN推理加速器，该开发板集成了单核Zynq 7Z007S。我们测量了所开发的加速器的执行时间和能耗，并将其与基于cpu的软件实现进行了比较。结果表明，该加速器在ALL-CNN-C模型端到端执行时帧率达到13 fps，在DarkNet上帧率达到4 fps。与软件实现相比，它的速度快了5倍，在80 MHz下提供高达每秒10.62千兆的操作(GOPS)，同时消耗1.08 W的片上功率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)

自引率

0.00%

发文量