Weiguang Chen, Z. Wang, Shanliao Li, Zhibin Yu, Huijuan Li
{"title":"Accelerating Compact Convolutional Neural Networks with Multi-threaded Data Streaming","authors":"Weiguang Chen, Z. Wang, Shanliao Li, Zhibin Yu, Huijuan Li","doi":"10.1109/ISVLSI.2019.00099","DOIUrl":null,"url":null,"abstract":"Recent advances in convolutional neural networks (CNNs) reveal the trend towards designing compact structures such as MobileNet, which adopts variations of traditional computing kernels such as pointwise and depthwise convolution. Such modified operations significantly reduce model size with an only slight degradation in inference accuracy. State-of-the-art neural accelerators have not yet fully exploit algorithmic parallelism for such computing kernels in compact CNNs. In this work, we propose a multithreaded data streaming architecture for fast and highly parallel execution of pointwise and depthwise convolution, which can be also dynamically reconfigured to process conventional convolution, pooling, and fully connected network layers. The architecture achieves efficient memory bandwidth utilization by exploiting two modes of data alignment. We profile MobileNet on the proposed architecture and demonstrate a 9:36x speed-up compared to single-threaded architecture.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"14 1","pages":"519-522"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Recent advances in convolutional neural networks (CNNs) reveal the trend towards designing compact structures such as MobileNet, which adopts variations of traditional computing kernels such as pointwise and depthwise convolution. Such modified operations significantly reduce model size with an only slight degradation in inference accuracy. State-of-the-art neural accelerators have not yet fully exploit algorithmic parallelism for such computing kernels in compact CNNs. In this work, we propose a multithreaded data streaming architecture for fast and highly parallel execution of pointwise and depthwise convolution, which can be also dynamically reconfigured to process conventional convolution, pooling, and fully connected network layers. The architecture achieves efficient memory bandwidth utilization by exploiting two modes of data alignment. We profile MobileNet on the proposed architecture and demonstrate a 9:36x speed-up compared to single-threaded architecture.