{"title":"Sparse Matrix-Dense Matrix Multiplication on Heterogeneous CPU+FPGA Embedded System","authors":"Mohammad Hosseinabady, J. Núñez-Yáñez","doi":"10.1145/3381427.3381428","DOIUrl":null,"url":null,"abstract":"Embedded intelligence is becoming the primary driver for new applications in industry, healthcare, and automotive, to name a few. The main characteristics of these applications are high computational demand, real-time interaction with the environment, security, low power consumption, and local autonomy, among others. Addressing these diverse characteristics, researchers have proposed heterogeneous multicore embedded systems comprising CPUs, GPUs, FPGAs, and ASICs. Whereas each computing element provides a unique capability to enable one of the application characteristics, collaborating these processing cores in running an application to get the maximum performance is a crucial challenge. This paper considers the collaborative usage of a multicore CPU and an FPGA in a heterogeneous embedded system to improve the performance of sparse matrix operations, which have been essential techniques in reducing the inference complexity in machine learning techniques, especially deep convolutional neural networks. Experimental results show that the collaborative execution of sparse-matrix-dense-matrix multiplication on the Xilinx Zynq MPSoC, a heterogeneous CPU+FPGA embedded system, can improve the performance by a factor of up to 42% compared with just using the FPGA as an accelerator.","PeriodicalId":38836,"journal":{"name":"Meta: Avaliacao","volume":"71 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta: Avaliacao","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3381427.3381428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 4
Abstract
Embedded intelligence is becoming the primary driver for new applications in industry, healthcare, and automotive, to name a few. The main characteristics of these applications are high computational demand, real-time interaction with the environment, security, low power consumption, and local autonomy, among others. Addressing these diverse characteristics, researchers have proposed heterogeneous multicore embedded systems comprising CPUs, GPUs, FPGAs, and ASICs. Whereas each computing element provides a unique capability to enable one of the application characteristics, collaborating these processing cores in running an application to get the maximum performance is a crucial challenge. This paper considers the collaborative usage of a multicore CPU and an FPGA in a heterogeneous embedded system to improve the performance of sparse matrix operations, which have been essential techniques in reducing the inference complexity in machine learning techniques, especially deep convolutional neural networks. Experimental results show that the collaborative execution of sparse-matrix-dense-matrix multiplication on the Xilinx Zynq MPSoC, a heterogeneous CPU+FPGA embedded system, can improve the performance by a factor of up to 42% compared with just using the FPGA as an accelerator.