{"title":"Low Power FPGA-SoC Design Techniques for CNN-based Object Detection Accelerator","authors":"Heekyung Kim, K. Choi","doi":"10.1109/UEMCON47517.2019.8992929","DOIUrl":null,"url":null,"abstract":"This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8992929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.