CNN hardware acceleration on a low-power and low-cost APSoC

P. Meloni, Antonio Garufi, Gianfranco Deriu, Marco Carreras, Daniela Loi
{"title":"CNN hardware acceleration on a low-power and low-cost APSoC","authors":"P. Meloni, Antonio Garufi, Gianfranco Deriu, Marco Carreras, Daniela Loi","doi":"10.1109/DASIP48288.2019.9049213","DOIUrl":null,"url":null,"abstract":"Deep learning and Convolutional Neural Networks (CNNs) in particular, are currently one of the most promising and widely used classes of algorithms in the field of artificial intelligence, being employed in a wide range of tasks. However, their high computational complexity and storage demands limit their efficient deployment on resource-limited embedded systems and IoT devices. To address this problem, in recent years a wide landscape of customized FPGA-based hardware acceleration solutions has been presented in literature, focused on combining high performance and power efficiency. Most of them are implemented on mid- to high-range devices including different computing cores, and target intensive models such as AlexNet and VGG16. In this work, we implement a CNN inference accelerator on a compact and cost-optimized device, the Minized development board from Avnet, integrating a single-core Zynq 7Z007S. We measure the execution time and energy consumption of the developed accelerator, and we compare it with a CPU-based software implementation. The results show that the accelerator achieves a frame rate of 13 fps on the end-to-end execution of ALL-CNN-C model, and 4 fps on DarkNet. Compared with the software implementation, it was 5 times faster providing up to 10.62 giga operations per second (GOPS) at 80 MHz while consuming 1.08 W of on-chip power.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASIP48288.2019.9049213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Deep learning and Convolutional Neural Networks (CNNs) in particular, are currently one of the most promising and widely used classes of algorithms in the field of artificial intelligence, being employed in a wide range of tasks. However, their high computational complexity and storage demands limit their efficient deployment on resource-limited embedded systems and IoT devices. To address this problem, in recent years a wide landscape of customized FPGA-based hardware acceleration solutions has been presented in literature, focused on combining high performance and power efficiency. Most of them are implemented on mid- to high-range devices including different computing cores, and target intensive models such as AlexNet and VGG16. In this work, we implement a CNN inference accelerator on a compact and cost-optimized device, the Minized development board from Avnet, integrating a single-core Zynq 7Z007S. We measure the execution time and energy consumption of the developed accelerator, and we compare it with a CPU-based software implementation. The results show that the accelerator achieves a frame rate of 13 fps on the end-to-end execution of ALL-CNN-C model, and 4 fps on DarkNet. Compared with the software implementation, it was 5 times faster providing up to 10.62 giga operations per second (GOPS) at 80 MHz while consuming 1.08 W of on-chip power.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于低功耗低成本APSoC的CNN硬件加速
特别是深度学习和卷积神经网络(cnn),是目前人工智能领域最有前途和最广泛使用的算法之一,被广泛应用于各种任务中。然而,它们的高计算复杂度和存储需求限制了它们在资源有限的嵌入式系统和物联网设备上的有效部署。为了解决这个问题,近年来在文献中提出了基于fpga的定制硬件加速解决方案,重点是结合高性能和功率效率。它们大多是在包括不同计算核心的中高档设备上实现的,目标是AlexNet和VGG16等密集型模型。在这项工作中,我们在安富利的小型化开发板上实现了一个CNN推理加速器,该开发板集成了单核Zynq 7Z007S。我们测量了所开发的加速器的执行时间和能耗,并将其与基于cpu的软件实现进行了比较。结果表明,该加速器在ALL-CNN-C模型端到端执行时帧率达到13 fps,在DarkNet上帧率达到4 fps。与软件实现相比,它的速度快了5倍,在80 MHz下提供高达每秒10.62千兆的操作(GOPS),同时消耗1.08 W的片上功率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A New Real-Time Embedded Video Denoising Algorithm Run-Time Coarse-Grained Hardware Mitigation for Multiple Faults on VLIW Processors Speeding-up CNN inference through dimensionality reduction POLYCiNN: Multiclass Binary Inference Engine using Convolutional Decision Forests Mapping and Frequency Joint Optimization for Energy Efficient Execution of Multiple Applications on Multicore Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1