基于FPGA的稀疏卷积神经网络并行柔性卷积核

Q4 Engineering IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI:10.2197/ipsjtsldm.12.22

Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi

{"title":"基于FPGA的稀疏卷积神经网络并行柔性卷积核","authors":"Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi","doi":"10.2197/ipsjtsldm.12.22","DOIUrl":null,"url":null,"abstract":"This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intraand inter-output parallelism of the convolutional layer, and leveraging weight sparsity using a compressed sparse model in the compressed sparse column format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity. Then, by combining both the multi-parallelism and the weight sparsity, the proposed architecture achieved 5.2 times better performance than the conventional accelerator.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"9 1","pages":"22-37"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA\",\"authors\":\"Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi\",\"doi\":\"10.2197/ipsjtsldm.12.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intraand inter-output parallelism of the convolutional layer, and leveraging weight sparsity using a compressed sparse model in the compressed sparse column format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity. Then, by combining both the multi-parallelism and the weight sparsity, the proposed architecture achieved 5.2 times better performance than the conventional accelerator.\",\"PeriodicalId\":38964,\"journal\":{\"name\":\"IPSJ Transactions on System LSI Design Methodology\",\"volume\":\"9 1\",\"pages\":\"22-37\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSJ Transactions on System LSI Design Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/ipsjtsldm.12.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on System LSI Design Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjtsldm.12.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 4

摘要

本文提出了一种稀疏CNN的卷积核，该卷积核能够灵活地交替并行方案和度，利用卷积层的输出内并行性和输出间并行性，并在压缩稀疏列格式和输出平稳数据流中使用压缩稀疏模型利用权稀疏性。实验结果表明，在传统加速器由于层数小而无法充分发挥并行性的情况下，即使在较深的层中，性能也提高了3.9倍。所提出的体系结构还可以利用权重稀疏性。然后，结合多重并行性和权值稀疏性，该架构的性能比传统加速器提高了5.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA

This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intraand inter-output parallelism of the convolutional layer, and leveraging weight sparsity using a compressed sparse model in the compressed sparse column format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity. Then, by combining both the multi-parallelism and the weight sparsity, the proposed architecture achieved 5.2 times better performance than the conventional accelerator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IPSJ Transactions on System LSI Design Methodology Engineering-Electrical and Electronic Engineering

CiteScore

1.20

自引率

0.00%

发文量