{"title":"基于FPGA的CNN实现内存优化技术","authors":"Masoud Shahshahani, Pingakshya Goswami, D. Bhatia","doi":"10.1109/DCAS.2018.8620112","DOIUrl":null,"url":null,"abstract":"Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.","PeriodicalId":320317,"journal":{"name":"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Memory Optimization Techniques for FPGA based CNN Implementations\",\"authors\":\"Masoud Shahshahani, Pingakshya Goswami, D. Bhatia\",\"doi\":\"10.1109/DCAS.2018.8620112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.\",\"PeriodicalId\":320317,\"journal\":{\"name\":\"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCAS.2018.8620112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCAS.2018.8620112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Memory Optimization Techniques for FPGA based CNN Implementations
Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.