{"title":"CNN在FPGA上高效硬件实现的优化","authors":"F. Farrukh, Tuo Xie, Chun Zhang, Zhihua Wang","doi":"10.1109/CICTA.2018.8706067","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) have been a hot research topic in recent years. The key element of DNN is to explore the real time hardware implementation. However, it requires a complete knowledge of hardware where the DNN is going to be implemented. The computational complexity and resource consumption of DNN is increasing by the time. Convolutional Neural Network (CNN) is the popular architecture of DNN especially for image classification. One requires an efficient implementation strategy of CNN to incorporate more computations in real time. Field Programmable Gate Array (FPGA) is considered to be the energy efficient choice for CNN as compared to Graphical Processing Units (GPUs). In this paper, new idea is explored and implemented for basic Processing Element (PE) of CNN. FPGA has limited built-in multiplier accumulator (MAC) units. In this work, MAC units are replaced by Wallace Tree based Multiplier which belongs to the family of log time array multipliers. The resources are saved in terms of MAC units and we can implement more processing elements on FPGA.","PeriodicalId":186840,"journal":{"name":"2018 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Optimization for Efficient Hardware Implementation of CNN on FPGA\",\"authors\":\"F. Farrukh, Tuo Xie, Chun Zhang, Zhihua Wang\",\"doi\":\"10.1109/CICTA.2018.8706067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNN) have been a hot research topic in recent years. The key element of DNN is to explore the real time hardware implementation. However, it requires a complete knowledge of hardware where the DNN is going to be implemented. The computational complexity and resource consumption of DNN is increasing by the time. Convolutional Neural Network (CNN) is the popular architecture of DNN especially for image classification. One requires an efficient implementation strategy of CNN to incorporate more computations in real time. Field Programmable Gate Array (FPGA) is considered to be the energy efficient choice for CNN as compared to Graphical Processing Units (GPUs). In this paper, new idea is explored and implemented for basic Processing Element (PE) of CNN. FPGA has limited built-in multiplier accumulator (MAC) units. In this work, MAC units are replaced by Wallace Tree based Multiplier which belongs to the family of log time array multipliers. The resources are saved in terms of MAC units and we can implement more processing elements on FPGA.\",\"PeriodicalId\":186840,\"journal\":{\"name\":\"2018 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICTA.2018.8706067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICTA.2018.8706067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimization for Efficient Hardware Implementation of CNN on FPGA
Deep neural networks (DNN) have been a hot research topic in recent years. The key element of DNN is to explore the real time hardware implementation. However, it requires a complete knowledge of hardware where the DNN is going to be implemented. The computational complexity and resource consumption of DNN is increasing by the time. Convolutional Neural Network (CNN) is the popular architecture of DNN especially for image classification. One requires an efficient implementation strategy of CNN to incorporate more computations in real time. Field Programmable Gate Array (FPGA) is considered to be the energy efficient choice for CNN as compared to Graphical Processing Units (GPUs). In this paper, new idea is explored and implemented for basic Processing Element (PE) of CNN. FPGA has limited built-in multiplier accumulator (MAC) units. In this work, MAC units are replaced by Wallace Tree based Multiplier which belongs to the family of log time array multipliers. The resources are saved in terms of MAC units and we can implement more processing elements on FPGA.