基于FPGA的CNN实现内存优化技术

Masoud Shahshahani, Pingakshya Goswami, D. Bhatia
{"title":"基于FPGA的CNN实现内存优化技术","authors":"Masoud Shahshahani, Pingakshya Goswami, D. Bhatia","doi":"10.1109/DCAS.2018.8620112","DOIUrl":null,"url":null,"abstract":"Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.","PeriodicalId":320317,"journal":{"name":"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Memory Optimization Techniques for FPGA based CNN Implementations\",\"authors\":\"Masoud Shahshahani, Pingakshya Goswami, D. Bhatia\",\"doi\":\"10.1109/DCAS.2018.8620112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.\",\"PeriodicalId\":320317,\"journal\":{\"name\":\"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCAS.2018.8620112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCAS.2018.8620112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

深度学习在图像分类、语音识别和自然语言处理方面发挥了重要作用。传统上,这些学习算法是在cpu和gpu集群中实现的。但是随着数据大小的增加,在cpu和gpu上创建的模型是不可扩展的。因此,我们需要一个硬件模型,它可以扩展到超出当前数据和模型大小的范围。这就是FPGA发挥作用的地方。随着fpga CAD工具的进步,设计人员不需要使用Verilog和VHDL等hdl来创建RTL级别的网络体系结构。他们可以使用C或c++等高级语言,使用Xilinx Vivado HLS等工具来构建模型。此外,与gpu相比,基于FPGA的深度学习模型的功耗要低得多。在本文中,我们对基于FPGA的深度学习架构的各种实现进行了广泛的调查,重点是卷积神经网络(CNN)。文献中提出的CNN架构消耗大量内存用于存储权重和图像。不可能在内部FPGA块RAM中存储此信息。本文全面介绍了文献中用于解决内存消耗问题的方法和技术,以及如何减少高存储外部DDR存储器和内部BRAM之间的数据移动。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Memory Optimization Techniques for FPGA based CNN Implementations
Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Differential Low-Power Voltage-Clamped ISFET Topology for Biomedical Applications Memory Optimization Techniques for FPGA based CNN Implementations Dual-Path Component Based Digital Receiver Linearization With a Very Non-linear Auxiliary Path Biomimetic, Soft-Material Synapse for Neuromorphic Computing: from Device to Network A Broadband Spectrum Channelizer with PWM-LO Based Sub-Band Equalization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1