{"title":"Fast-ABC: A Fast Architecture for Bottleneck-Like Based Convolutional Neural Networks","authors":"Xiaoru Xie, Fangxuan Sun, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI.2019.00010","DOIUrl":null,"url":null,"abstract":"In recent years, studies on efficient inference of neural networks have become one of the most popular research fields. In order to reduce the required number of computations and weights, many efforts have been made to construct light weight networks (LWNs) where bottleneck-like operations (BLOs) have been widely adopted. However, most current hardware accelerators are not able to utilize the optimization space for BLOs. This paper firstly show that the conventional computational flows employed by most existing accelerators will incur extremely low resource utilization ratio due to the extremely high DRAM bandwidth requirements in these LWNs via both theoretic analysis and experimental results. To address this issue, a partial fusion strategy which can drastically reduce bandwidth requirement is proposed. Additionaly, Winograd algorithm is also employed to further reduce the computational complexity. Based on these, an efficient accelerator for BLO-based networks called Fast Architecture for Bottleneck-like based Convolutional neural networks (Fast-ABC) is proposed. Fast-ABC is implemented on Altera Stratix V GSMD8, and can achieve a very high throughput of up to 137 fps and 264 fps on ResNet-18 and MobileNetV2, respectively. Implementation results show that the proposed architecture significantly improve the throughput on LWNs compared with the prior arts with even much less resources cost.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"32 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In recent years, studies on efficient inference of neural networks have become one of the most popular research fields. In order to reduce the required number of computations and weights, many efforts have been made to construct light weight networks (LWNs) where bottleneck-like operations (BLOs) have been widely adopted. However, most current hardware accelerators are not able to utilize the optimization space for BLOs. This paper firstly show that the conventional computational flows employed by most existing accelerators will incur extremely low resource utilization ratio due to the extremely high DRAM bandwidth requirements in these LWNs via both theoretic analysis and experimental results. To address this issue, a partial fusion strategy which can drastically reduce bandwidth requirement is proposed. Additionaly, Winograd algorithm is also employed to further reduce the computational complexity. Based on these, an efficient accelerator for BLO-based networks called Fast Architecture for Bottleneck-like based Convolutional neural networks (Fast-ABC) is proposed. Fast-ABC is implemented on Altera Stratix V GSMD8, and can achieve a very high throughput of up to 137 fps and 264 fps on ResNet-18 and MobileNetV2, respectively. Implementation results show that the proposed architecture significantly improve the throughput on LWNs compared with the prior arts with even much less resources cost.