异构CPU+FPGA嵌入式系统的稀疏矩阵-密集矩阵乘法

Q4 Social Sciences Meta: Avaliacao Pub Date : 2020-01-21 DOI:10.1145/3381427.3381428
Mohammad Hosseinabady, J. Núñez-Yáñez
{"title":"异构CPU+FPGA嵌入式系统的稀疏矩阵-密集矩阵乘法","authors":"Mohammad Hosseinabady, J. Núñez-Yáñez","doi":"10.1145/3381427.3381428","DOIUrl":null,"url":null,"abstract":"Embedded intelligence is becoming the primary driver for new applications in industry, healthcare, and automotive, to name a few. The main characteristics of these applications are high computational demand, real-time interaction with the environment, security, low power consumption, and local autonomy, among others. Addressing these diverse characteristics, researchers have proposed heterogeneous multicore embedded systems comprising CPUs, GPUs, FPGAs, and ASICs. Whereas each computing element provides a unique capability to enable one of the application characteristics, collaborating these processing cores in running an application to get the maximum performance is a crucial challenge. This paper considers the collaborative usage of a multicore CPU and an FPGA in a heterogeneous embedded system to improve the performance of sparse matrix operations, which have been essential techniques in reducing the inference complexity in machine learning techniques, especially deep convolutional neural networks. Experimental results show that the collaborative execution of sparse-matrix-dense-matrix multiplication on the Xilinx Zynq MPSoC, a heterogeneous CPU+FPGA embedded system, can improve the performance by a factor of up to 42% compared with just using the FPGA as an accelerator.","PeriodicalId":38836,"journal":{"name":"Meta: Avaliacao","volume":"71 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Sparse Matrix-Dense Matrix Multiplication on Heterogeneous CPU+FPGA Embedded System\",\"authors\":\"Mohammad Hosseinabady, J. Núñez-Yáñez\",\"doi\":\"10.1145/3381427.3381428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Embedded intelligence is becoming the primary driver for new applications in industry, healthcare, and automotive, to name a few. The main characteristics of these applications are high computational demand, real-time interaction with the environment, security, low power consumption, and local autonomy, among others. Addressing these diverse characteristics, researchers have proposed heterogeneous multicore embedded systems comprising CPUs, GPUs, FPGAs, and ASICs. Whereas each computing element provides a unique capability to enable one of the application characteristics, collaborating these processing cores in running an application to get the maximum performance is a crucial challenge. This paper considers the collaborative usage of a multicore CPU and an FPGA in a heterogeneous embedded system to improve the performance of sparse matrix operations, which have been essential techniques in reducing the inference complexity in machine learning techniques, especially deep convolutional neural networks. Experimental results show that the collaborative execution of sparse-matrix-dense-matrix multiplication on the Xilinx Zynq MPSoC, a heterogeneous CPU+FPGA embedded system, can improve the performance by a factor of up to 42% compared with just using the FPGA as an accelerator.\",\"PeriodicalId\":38836,\"journal\":{\"name\":\"Meta: Avaliacao\",\"volume\":\"71 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Meta: Avaliacao\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3381427.3381428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta: Avaliacao","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3381427.3381428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 4

摘要

嵌入式智能正在成为工业、医疗保健和汽车等新应用的主要驱动因素。这些应用程序的主要特点是高计算需求、与环境的实时交互、安全性、低功耗和本地自治等。针对这些不同的特性,研究人员提出了异构多核嵌入式系统,包括cpu、gpu、fpga和asic。虽然每个计算元素都提供了一个独特的功能来支持应用程序的一个特征,但在运行应用程序时协作这些处理核心以获得最大性能是一个关键的挑战。本文考虑了多核CPU和FPGA在异构嵌入式系统中的协同使用,以提高稀疏矩阵运算的性能,这是降低机器学习技术,特别是深度卷积神经网络中推理复杂性的基本技术。实验结果表明,在异构CPU+FPGA嵌入式系统Xilinx Zynq MPSoC上协同执行稀疏矩阵-密集矩阵乘法运算,与仅使用FPGA作为加速器相比,运算性能可提高42%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sparse Matrix-Dense Matrix Multiplication on Heterogeneous CPU+FPGA Embedded System
Embedded intelligence is becoming the primary driver for new applications in industry, healthcare, and automotive, to name a few. The main characteristics of these applications are high computational demand, real-time interaction with the environment, security, low power consumption, and local autonomy, among others. Addressing these diverse characteristics, researchers have proposed heterogeneous multicore embedded systems comprising CPUs, GPUs, FPGAs, and ASICs. Whereas each computing element provides a unique capability to enable one of the application characteristics, collaborating these processing cores in running an application to get the maximum performance is a crucial challenge. This paper considers the collaborative usage of a multicore CPU and an FPGA in a heterogeneous embedded system to improve the performance of sparse matrix operations, which have been essential techniques in reducing the inference complexity in machine learning techniques, especially deep convolutional neural networks. Experimental results show that the collaborative execution of sparse-matrix-dense-matrix multiplication on the Xilinx Zynq MPSoC, a heterogeneous CPU+FPGA embedded system, can improve the performance by a factor of up to 42% compared with just using the FPGA as an accelerator.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Meta: Avaliacao
Meta: Avaliacao Social Sciences-Education
CiteScore
0.40
自引率
0.00%
发文量
13
审稿时长
10 weeks
期刊最新文献
Camera spectral sensitivity estimation based on spectrally tunable LED illumination Metamer mismatch volume calculation method based on high-dimensional spherical sampling Machine vision-based portable track inspection system Optimization of RGB image spectral reconstruction based on radial basis function networks Study on spectral adaptive transformation based on chromatic aberration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1