A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3547276.3548521

Jiangwei Shang, Zhan Zhang, Chuanyou Li, Kun Zhang, Lei Qian, Hongwei Liu

{"title":"A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA","authors":"Jiangwei Shang, Zhan Zhang, Chuanyou Li, Kun Zhang, Lei Qian, Hongwei Liu","doi":"10.1145/3547276.3548521","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) have been widely used in different areas. The success of CNNs comes with a huge amount of parameters and computations, and nowaday CNNs still keep moving toward larger structures. Although larger structures often bring about better inference accuracy, the increasing size also slows the inference speed down. Recently, various parameter sparsity methods have been proposed to accelerate CNNs by reducing the number of parameters and computations. Existing sparsity methods could be classified into two categories: unstructured and structured. Unstructured sparsity methods easily cause irregularity and thus have a suboptimal speedup. On the other hand, the structured sparsity methods could keep regularity by pruning the parameters following a certain pattern but result in low sparsity. In this paper, we propose a software/hardware co-design approach to bring local irregular sparsity into CNNs. Benefiting from the local irregularity, we design a row-wise computing engine, RConv Engine, to achieve workload balance and remarkable speedup. The experimental results show that our software/hardware co-design method can achieve a 10.9x speedup than the state-of-the-art methods with a negligible accuracy loss.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) have been widely used in different areas. The success of CNNs comes with a huge amount of parameters and computations, and nowaday CNNs still keep moving toward larger structures. Although larger structures often bring about better inference accuracy, the increasing size also slows the inference speed down. Recently, various parameter sparsity methods have been proposed to accelerate CNNs by reducing the number of parameters and computations. Existing sparsity methods could be classified into two categories: unstructured and structured. Unstructured sparsity methods easily cause irregularity and thus have a suboptimal speedup. On the other hand, the structured sparsity methods could keep regularity by pruning the parameters following a certain pattern but result in low sparsity. In this paper, we propose a software/hardware co-design approach to bring local irregular sparsity into CNNs. Benefiting from the local irregularity, we design a row-wise computing engine, RConv Engine, to achieve workload balance and remarkable speedup. The experimental results show that our software/hardware co-design method can achieve a 10.9x speedup than the state-of-the-art methods with a negligible accuracy loss.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在FPGA上加速cnn的软硬件协同设计局部不规则稀疏度方法

卷积神经网络(cnn)在不同领域得到了广泛的应用。cnn的成功离不开大量的参数和计算，现在cnn还在不断向更大的结构发展。虽然更大的结构往往带来更好的推理精度，但尺寸的增加也会减慢推理速度。近年来，人们提出了各种参数稀疏度方法，通过减少参数数量和计算量来加速cnn。现有的稀疏性方法可以分为两类:非结构化和结构化。非结构化稀疏性方法容易导致不规则性，因此具有次优加速。另一方面，结构化稀疏度方法可以按照一定的模式对参数进行剪枝，从而保持正则性，但稀疏度较低。本文提出了一种将局部不规则稀疏性引入cnn的软硬件协同设计方法。利用局部不规则性，我们设计了一个逐行计算引擎RConv engine，实现了工作负载的平衡和显著的加速。实验结果表明，我们的软件/硬件协同设计方法可以实现比最先进的方法提高10.9倍的速度，而精度损失可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Workshop Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量

期刊最新文献

A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA A Fast and Secure AKA Protocol for B5G Execution Flow Aware Profiling for ROS-based Autonomous Vehicle Software A User-Based Bike Return Algorithm for Docked Bike Sharing Systems Extracting High Definition Map Information from Aerial Images