Pub Date : 2019-10-01DOI: 10.1109/DASIP48288.2019.9049213
P. Meloni, Antonio Garufi, Gianfranco Deriu, Marco Carreras, Daniela Loi
Deep learning and Convolutional Neural Networks (CNNs) in particular, are currently one of the most promising and widely used classes of algorithms in the field of artificial intelligence, being employed in a wide range of tasks. However, their high computational complexity and storage demands limit their efficient deployment on resource-limited embedded systems and IoT devices. To address this problem, in recent years a wide landscape of customized FPGA-based hardware acceleration solutions has been presented in literature, focused on combining high performance and power efficiency. Most of them are implemented on mid- to high-range devices including different computing cores, and target intensive models such as AlexNet and VGG16. In this work, we implement a CNN inference accelerator on a compact and cost-optimized device, the Minized development board from Avnet, integrating a single-core Zynq 7Z007S. We measure the execution time and energy consumption of the developed accelerator, and we compare it with a CPU-based software implementation. The results show that the accelerator achieves a frame rate of 13 fps on the end-to-end execution of ALL-CNN-C model, and 4 fps on DarkNet. Compared with the software implementation, it was 5 times faster providing up to 10.62 giga operations per second (GOPS) at 80 MHz while consuming 1.08 W of on-chip power.
{"title":"CNN hardware acceleration on a low-power and low-cost APSoC","authors":"P. Meloni, Antonio Garufi, Gianfranco Deriu, Marco Carreras, Daniela Loi","doi":"10.1109/DASIP48288.2019.9049213","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049213","url":null,"abstract":"Deep learning and Convolutional Neural Networks (CNNs) in particular, are currently one of the most promising and widely used classes of algorithms in the field of artificial intelligence, being employed in a wide range of tasks. However, their high computational complexity and storage demands limit their efficient deployment on resource-limited embedded systems and IoT devices. To address this problem, in recent years a wide landscape of customized FPGA-based hardware acceleration solutions has been presented in literature, focused on combining high performance and power efficiency. Most of them are implemented on mid- to high-range devices including different computing cores, and target intensive models such as AlexNet and VGG16. In this work, we implement a CNN inference accelerator on a compact and cost-optimized device, the Minized development board from Avnet, integrating a single-core Zynq 7Z007S. We measure the execution time and energy consumption of the developed accelerator, and we compare it with a CPU-based software implementation. The results show that the accelerator achieves a frame rate of 13 fps on the end-to-end execution of ALL-CNN-C model, and 4 fps on DarkNet. Compared with the software implementation, it was 5 times faster providing up to 10.62 giga operations per second (GOPS) at 80 MHz while consuming 1.08 W of on-chip power.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134270916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/DASIP48288.2019.9049184
A. Hennequin, Benjamin Couturier, V. Gligorov, L. Lacassagne
Connected components labeling and analysis for dense images have been extensively studied on a wide range of architectures. Some applications, like particles detectors in High Energy Physics, need to analyse many small and sparse images at high throughput. Because they process all pixels of the image, classic algorithms for dense images are inefficient on sparse data. We address this inefficiency by introducing a new algorithm specifically designed for sparse images. We show that we can further improve this sparse algorithm by specializing it for the data input format, avoiding a decoding step and processing multiple pixels at once. A benchmark on Intel and AMD CPUs shows that the algorithm is from x 1.6 to x 2.5 faster on sparse images.
{"title":"SparseCCL: Connected Components Labeling and Analysis for sparse images","authors":"A. Hennequin, Benjamin Couturier, V. Gligorov, L. Lacassagne","doi":"10.1109/DASIP48288.2019.9049184","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049184","url":null,"abstract":"Connected components labeling and analysis for dense images have been extensively studied on a wide range of architectures. Some applications, like particles detectors in High Energy Physics, need to analyse many small and sparse images at high throughput. Because they process all pixels of the image, classic algorithms for dense images are inefficient on sparse data. We address this inefficiency by introducing a new algorithm specifically designed for sparse images. We show that we can further improve this sparse algorithm by specializing it for the data input format, avoiding a decoding step and processing multiple pixels at once. A benchmark on Intel and AMD CPUs shows that the algorithm is from x 1.6 to x 2.5 faster on sparse images.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129737978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}