{"title":"使用基于MATLAB fpga的深度学习处理器分析cnn","authors":"S. Spanò, L. Canese, G. Cardarilli","doi":"10.1109/prime55000.2022.9816841","DOIUrl":null,"url":null,"abstract":"In this paper we assess the performance of the new MATLAB Deep Learning Processor. It is a hardware architecture meant for FPGA devices which is able to infer Convolutional Neural Networks. The system is deployed on a Xilinx ZCU102 SoC and we customize it with the aim to maximize its processing performance. We evaluate the hardware resources utilization, the maximum achievable clock frequency, and the power dissipation of the system. Our goal is to find the best performing networks on FPGA and, eventually, to compare the results with a GPU-based counterpart. We conduct an experimental campaign where the FPGA execution time of several CNNs is profiled and compared to the execution time on the NVIDIA Titan RTX GPU platform. This allows a comparative performance analysis when the same network is inferred on different systems. We consider all the available CNNs of the MATLAB suite which have been pretrained with the ImageNet dataset. Finally, to pinpoint the most cost-effective network, the FPGA prediction time is put in relation with the accuracy on the aforementioned dataset.","PeriodicalId":142196,"journal":{"name":"2022 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Profiling of CNNs using the MATLAB FPGA-based Deep Learning Processor\",\"authors\":\"S. Spanò, L. Canese, G. Cardarilli\",\"doi\":\"10.1109/prime55000.2022.9816841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we assess the performance of the new MATLAB Deep Learning Processor. It is a hardware architecture meant for FPGA devices which is able to infer Convolutional Neural Networks. The system is deployed on a Xilinx ZCU102 SoC and we customize it with the aim to maximize its processing performance. We evaluate the hardware resources utilization, the maximum achievable clock frequency, and the power dissipation of the system. Our goal is to find the best performing networks on FPGA and, eventually, to compare the results with a GPU-based counterpart. We conduct an experimental campaign where the FPGA execution time of several CNNs is profiled and compared to the execution time on the NVIDIA Titan RTX GPU platform. This allows a comparative performance analysis when the same network is inferred on different systems. We consider all the available CNNs of the MATLAB suite which have been pretrained with the ImageNet dataset. Finally, to pinpoint the most cost-effective network, the FPGA prediction time is put in relation with the accuracy on the aforementioned dataset.\",\"PeriodicalId\":142196,\"journal\":{\"name\":\"2022 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/prime55000.2022.9816841\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/prime55000.2022.9816841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Profiling of CNNs using the MATLAB FPGA-based Deep Learning Processor
In this paper we assess the performance of the new MATLAB Deep Learning Processor. It is a hardware architecture meant for FPGA devices which is able to infer Convolutional Neural Networks. The system is deployed on a Xilinx ZCU102 SoC and we customize it with the aim to maximize its processing performance. We evaluate the hardware resources utilization, the maximum achievable clock frequency, and the power dissipation of the system. Our goal is to find the best performing networks on FPGA and, eventually, to compare the results with a GPU-based counterpart. We conduct an experimental campaign where the FPGA execution time of several CNNs is profiled and compared to the execution time on the NVIDIA Titan RTX GPU platform. This allows a comparative performance analysis when the same network is inferred on different systems. We consider all the available CNNs of the MATLAB suite which have been pretrained with the ImageNet dataset. Finally, to pinpoint the most cost-effective network, the FPGA prediction time is put in relation with the accuracy on the aforementioned dataset.