An hardware accelerator design of Mobile-Net model on FPGA

Sanjaya M V, M. Rao
{"title":"An hardware accelerator design of Mobile-Net model on FPGA","authors":"Sanjaya M V, M. Rao","doi":"10.1145/3564121.3564124","DOIUrl":null,"url":null,"abstract":"Domain specific hardware architectures and hardware accelerators have been a vital part of modern system design. Especially for math intensive applications involving tasks related to machine perception, incorporating hardware accelerators that work in tandem with general purpose micro-processors can prove to be energy efficient both at server and edge scenarios. FPGAs, due to their reconfigurability makes it possible to have customized hardware designed as per the computational and memory requirements specific to that application. This work proposes an optimized low latency hardware accelerator implementation of Mobile-net V2 CNN on an FPGA. This paper presents an implementation of Mobile-net-V2 inference on a Xilinx Ultrascale+ MPSOC platform incorporating solely half precision floating point arithmetic for both parameters and activations of the network. The proposed implementation is also optimized by merging all batch-norm layers with its preceding convolutional layers. For applications which cannot compromise on performance of the algorithm for execution speed and efficiency, an optimized floating point inference is proposed. The current implementation offers an overall performance improvement of at-least 20X with moderate resource utilization with minimal variance in inference latency, as compared to performing inference on the processor alone with almost no degradation in the model accuracy.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564121.3564124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Domain specific hardware architectures and hardware accelerators have been a vital part of modern system design. Especially for math intensive applications involving tasks related to machine perception, incorporating hardware accelerators that work in tandem with general purpose micro-processors can prove to be energy efficient both at server and edge scenarios. FPGAs, due to their reconfigurability makes it possible to have customized hardware designed as per the computational and memory requirements specific to that application. This work proposes an optimized low latency hardware accelerator implementation of Mobile-net V2 CNN on an FPGA. This paper presents an implementation of Mobile-net-V2 inference on a Xilinx Ultrascale+ MPSOC platform incorporating solely half precision floating point arithmetic for both parameters and activations of the network. The proposed implementation is also optimized by merging all batch-norm layers with its preceding convolutional layers. For applications which cannot compromise on performance of the algorithm for execution speed and efficiency, an optimized floating point inference is proposed. The current implementation offers an overall performance improvement of at-least 20X with moderate resource utilization with minimal variance in inference latency, as compared to performing inference on the processor alone with almost no degradation in the model accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于FPGA的移动网络模型硬件加速器设计
领域专用硬件架构和硬件加速器已经成为现代系统设计的重要组成部分。特别是对于涉及与机器感知相关任务的数学密集型应用程序,将硬件加速器与通用微处理器结合起来,在服务器和边缘场景中都可以证明是节能的。fpga由于其可重构性,使得根据特定应用程序的计算和内存要求定制硬件成为可能。本文提出了一种在FPGA上优化的低延迟的Mobile-net V2 CNN硬件加速器实现。本文介绍了在Xilinx Ultrascale+ MPSOC平台上实现Mobile-net-V2推理的方法,该平台仅包含用于网络参数和激活的半精度浮点算法。提出的实现还通过将所有批规范层与之前的卷积层合并来优化。对于不能在执行速度和效率上牺牲算法性能的应用,提出了一种优化的浮点推理。目前的实现提供了至少20倍的总体性能改进,资源利用率适中,推理延迟变化最小,与单独在处理器上执行推理相比,模型精度几乎没有下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Hybrid Planning System for Smart Charging of Electric Fleets CluSpa: Computation Reduction in CNN Inference by exploiting Clustering and Sparsity Acceleration-aware, Retraining-free Evolutionary Pruning for Automated Fitment of Deep Learning Models on Edge Devices Patch-wise Features for Blur Image Classification Identification of Causal Dependencies in Multivariate Time Series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1