Concurrent MAC unit design using VHDL for deep learning networks on FPGA

Hossam O. Ahmed, M. Ghoneima, M. Dessouky
{"title":"Concurrent MAC unit design using VHDL for deep learning networks on FPGA","authors":"Hossam O. Ahmed, M. Ghoneima, M. Dessouky","doi":"10.1109/ISCAIE.2018.8405440","DOIUrl":null,"url":null,"abstract":"Deep neural network algorithms have proven their enormous capabilities in wide range of artificial intelligence applications, specially in Printed/Handwritten text recognition, Multimedia processing, Robotics and many other high end technological trends. The most challenging aspect nowadays is to overcome the extremely computational processing demands in applying such algorithms, especially in real-time systems. Recently, the Field Programmable Gate Array (FPGA) has been considered as one of the optimum hardware accelerator platform for accelerating the deep neural network architectures due to its large adaptability and the high degree of parallelism it offers. In this paper, the proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture aimed to create a fully-customize MAC unit for the Convolutional Neural Networks (CNN) instead of depending on the conventional DSP blocks and embedded memories units on the FPGAs architecture silicon fabrics. The proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture is designed using VHDL language and can performs a computational speed up to 4.17 Giga Operation per Second (GOPS) using high-density FPGAs.","PeriodicalId":333327,"journal":{"name":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAIE.2018.8405440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Deep neural network algorithms have proven their enormous capabilities in wide range of artificial intelligence applications, specially in Printed/Handwritten text recognition, Multimedia processing, Robotics and many other high end technological trends. The most challenging aspect nowadays is to overcome the extremely computational processing demands in applying such algorithms, especially in real-time systems. Recently, the Field Programmable Gate Array (FPGA) has been considered as one of the optimum hardware accelerator platform for accelerating the deep neural network architectures due to its large adaptability and the high degree of parallelism it offers. In this paper, the proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture aimed to create a fully-customize MAC unit for the Convolutional Neural Networks (CNN) instead of depending on the conventional DSP blocks and embedded memories units on the FPGAs architecture silicon fabrics. The proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture is designed using VHDL language and can performs a computational speed up to 4.17 Giga Operation per Second (GOPS) using high-density FPGAs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于FPGA的深度学习网络并行MAC单元设计
深度神经网络算法已经在广泛的人工智能应用中证明了其巨大的能力,特别是在印刷/手写文本识别,多媒体处理,机器人和许多其他高端技术趋势中。目前最具挑战性的方面是克服应用这些算法的极端计算处理需求,特别是在实时系统中。近年来,现场可编程门阵列(FPGA)由于其具有较大的适应性和高度的并行性,被认为是加速深度神经网络架构的最佳硬件加速器平台之一。在本文中,提出的8位定点并行乘法累积(MAC)单元架构旨在为卷积神经网络(CNN)创建一个完全定制的MAC单元,而不是依赖于传统的DSP模块和fpga架构硅结构上的嵌入式存储器单元。所提出的8位定点并行乘法累加(MAC)单元架构采用VHDL语言设计,采用高密度fpga可实现高达4.17千兆运算每秒(GOPS)的计算速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improved recurrent NARX neural network model for state of charge estimation of lithium-ion battery using pso algorithm Exploring antecedent factors toward knowledge sharing intention in E-learning The development of sports science knowledge management systems through CommonKADS and digital Kanban board Cancelable biometrics technique for iris recognition Timing analysis for Diffie Hellman Key Exchange In U-BOOT using Raspberry pi
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1