Fast-ABC: A Fast Architecture for Bottleneck-Like Based Convolutional Neural Networks

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2019-07-01 DOI:10.1109/ISVLSI.2019.00010

Xiaoru Xie, Fangxuan Sun, Jun Lin, Zhongfeng Wang

{"title":"Fast-ABC: A Fast Architecture for Bottleneck-Like Based Convolutional Neural Networks","authors":"Xiaoru Xie, Fangxuan Sun, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI.2019.00010","DOIUrl":null,"url":null,"abstract":"In recent years, studies on efficient inference of neural networks have become one of the most popular research fields. In order to reduce the required number of computations and weights, many efforts have been made to construct light weight networks (LWNs) where bottleneck-like operations (BLOs) have been widely adopted. However, most current hardware accelerators are not able to utilize the optimization space for BLOs. This paper firstly show that the conventional computational flows employed by most existing accelerators will incur extremely low resource utilization ratio due to the extremely high DRAM bandwidth requirements in these LWNs via both theoretic analysis and experimental results. To address this issue, a partial fusion strategy which can drastically reduce bandwidth requirement is proposed. Additionaly, Winograd algorithm is also employed to further reduce the computational complexity. Based on these, an efficient accelerator for BLO-based networks called Fast Architecture for Bottleneck-like based Convolutional neural networks (Fast-ABC) is proposed. Fast-ABC is implemented on Altera Stratix V GSMD8, and can achieve a very high throughput of up to 137 fps and 264 fps on ResNet-18 and MobileNetV2, respectively. Implementation results show that the proposed architecture significantly improve the throughput on LWNs compared with the prior arts with even much less resources cost.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"32 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In recent years, studies on efficient inference of neural networks have become one of the most popular research fields. In order to reduce the required number of computations and weights, many efforts have been made to construct light weight networks (LWNs) where bottleneck-like operations (BLOs) have been widely adopted. However, most current hardware accelerators are not able to utilize the optimization space for BLOs. This paper firstly show that the conventional computational flows employed by most existing accelerators will incur extremely low resource utilization ratio due to the extremely high DRAM bandwidth requirements in these LWNs via both theoretic analysis and experimental results. To address this issue, a partial fusion strategy which can drastically reduce bandwidth requirement is proposed. Additionaly, Winograd algorithm is also employed to further reduce the computational complexity. Based on these, an efficient accelerator for BLO-based networks called Fast Architecture for Bottleneck-like based Convolutional neural networks (Fast-ABC) is proposed. Fast-ABC is implemented on Altera Stratix V GSMD8, and can achieve a very high throughput of up to 137 fps and 264 fps on ResNet-18 and MobileNetV2, respectively. Implementation results show that the proposed architecture significantly improve the throughput on LWNs compared with the prior arts with even much less resources cost.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast- abc:基于瓶颈的卷积神经网络的快速架构

近年来，神经网络的高效推理研究已成为一个热门的研究领域。为了减少所需的计算量和权重，人们在构建轻量级网络(LWNs)方面做出了许多努力，其中广泛采用了类瓶颈操作(BLOs)。然而，目前大多数硬件加速器都不能利用bloo的优化空间。本文首先通过理论分析和实验结果表明，由于LWNs对DRAM带宽的要求极高，大多数现有加速器采用的传统计算流程导致资源利用率极低。为了解决这一问题，提出了一种可以大幅降低带宽需求的部分融合策略。此外，为了进一步降低计算复杂度，还采用了Winograd算法。在此基础上，提出了一种高效的基于bloc的卷积神经网络加速器Fast Architecture for bottleneck - Based Convolutional neural networks (Fast- abc)。Fast-ABC是在Altera Stratix V GSMD8上实现的，在ResNet-18和MobileNetV2上分别可以达到137 fps和264 fps的高吞吐量。实现结果表明，与现有技术相比，该架构显著提高了LWNs的吞吐量，且资源成本更低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量

期刊最新文献

Ferroelectric FET Based TCAM Designs for Energy Efficient Computing Evaluation of Compilers Effects on OpenMP Soft Error Resiliency Towards Efficient Compact Network Training on Edge-Devices PageCmp: Bandwidth Efficient Page Deduplication through In-memory Page Comparison Improving Logic Optimization in Sequential Circuits using Majority-inverter Graphs