A Self-adaptation Method of Fitting Convolutional Neural Network into FPGA: Abstract Only)

Ning Mao, Zhihong Huang, Xing Wei, He Zhao, Xinkai Di, Le Yu, Haigang Yang
{"title":"A Self-adaptation Method of Fitting Convolutional Neural Network into FPGA: Abstract Only)","authors":"Ning Mao, Zhihong Huang, Xing Wei, He Zhao, Xinkai Di, Le Yu, Haigang Yang","doi":"10.1145/3174243.3175003","DOIUrl":null,"url":null,"abstract":"In recent years, Convolutional Neural Networks (CNNs) have been used widely in many artificial intelligence (AI) related fields. Of many implementation platforms for CNNs, FPGA is regarded as an optimal platform because of its high power-efficiency and flexibility. Although various FPGA accelerators have been proposed to realize CNN, some of them are implemented by High-Level Synthesis such as in OpenCL. This may result in inefficiency in operation performance and resource utilization. Therefore, we propose to parameterize the RTL design at both algorithm and hardware implementation levels. Four types of parallelism are considered to model the parameterized design in terms of the input feature map, the output feature map, the layer and the convolution kernel. Meanwhile a library covering convolution layer, fully-connected layer, pooling layer, control module is established to cater for various CNN models. Further, an algorithm is proposed to find an optimal level of parallelism dedicated to limited resources. As a case study, four typical CNNs are implemented on Stratix III EP3SL110, taking up on-chip memory. Compared with some existing works using the automated design flow, the implementations obtained by the proposed approach have achieved up to 17.13× GOPS. To the best estimate, our design has also achieved 1.33× resource efficiency and 3.61× power efficiency.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3175003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, Convolutional Neural Networks (CNNs) have been used widely in many artificial intelligence (AI) related fields. Of many implementation platforms for CNNs, FPGA is regarded as an optimal platform because of its high power-efficiency and flexibility. Although various FPGA accelerators have been proposed to realize CNN, some of them are implemented by High-Level Synthesis such as in OpenCL. This may result in inefficiency in operation performance and resource utilization. Therefore, we propose to parameterize the RTL design at both algorithm and hardware implementation levels. Four types of parallelism are considered to model the parameterized design in terms of the input feature map, the output feature map, the layer and the convolution kernel. Meanwhile a library covering convolution layer, fully-connected layer, pooling layer, control module is established to cater for various CNN models. Further, an algorithm is proposed to find an optimal level of parallelism dedicated to limited resources. As a case study, four typical CNNs are implemented on Stratix III EP3SL110, taking up on-chip memory. Compared with some existing works using the automated design flow, the implementations obtained by the proposed approach have achieved up to 17.13× GOPS. To the best estimate, our design has also achieved 1.33× resource efficiency and 3.61× power efficiency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
卷积神经网络在FPGA中的自适应拟合方法
近年来,卷积神经网络(cnn)在许多人工智能(AI)相关领域得到了广泛的应用。在众多的cnn实现平台中,FPGA因其高能效和灵活性被认为是最优的平台。虽然已经提出了各种FPGA加速器来实现CNN,但其中一些是通过高级合成(High-Level Synthesis)实现的,例如在OpenCL中。这可能导致操作性能和资源利用率低下。因此,我们建议在算法和硬件实现级别对RTL设计进行参数化。从输入特征映射、输出特征映射、层和卷积核四个方面考虑了四种并行性对参数化设计进行建模。同时建立了包含卷积层、全连接层、池化层、控制模块的库,以适应各种CNN模型。在此基础上,提出了一种针对有限资源的最优并行度算法。作为案例研究,在Stratix III EP3SL110上实现了四个典型的cnn,占用片上内存。与已有的自动化设计流程相比,该方法实现的GOPS可达17.13× GOPS。最乐观的估计,我们的设计也达到了1.33倍的资源效率和3.61倍的功率效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Architecture and Circuit Design of an All-Spintronic FPGA Session details: Session 6: High Level Synthesis 2 A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only) Software/Hardware Co-design for Multichannel Scheduling in IEEE 802.11p MLME: (Abstract Only) Session details: Special Session: Deep Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1