{"title":"捷径挖掘:利用DCNN加速器中的跨层捷径重用","authors":"Arash AziziMazreah, Lizhong Chen","doi":"10.1109/HPCA.2019.00030","DOIUrl":null,"url":null,"abstract":"Off-chip memory traffic has been a major performance bottleneck in deep learning accelerators. While reusing on-chip data is a promising way to reduce off-chip traffic, the opportunity on reusing shortcut connection data in deep networks (e.g., residual networks) have been largely neglected. Those shortcut data accounts for nearly 40% of the total feature map data. In this paper, we propose Shortcut Mining, a novel approach that “mines” the unexploited opportunity of on-chip data reusing. We introduce the abstraction of logical buffers to address the lack of flexibility in existing buffer architecture, and then propose a sequence of procedures which, collectively, can effectively reuse both shortcut and non-shortcut feature maps. The proposed procedures are also able to reuse shortcut data across any number of intermediate layers without using additional buffer resources. Experiment results from prototyping on FPGAs show that, the proposed Shortcut Mining achieves 53.3%, 58%, and 43% reduction in off-chip feature map traffic for SqueezeNet, ResNet-34, and ResNet152, respectively and a 1.93X increase in throughput compared with a state-of-the-art accelerator.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators\",\"authors\":\"Arash AziziMazreah, Lizhong Chen\",\"doi\":\"10.1109/HPCA.2019.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Off-chip memory traffic has been a major performance bottleneck in deep learning accelerators. While reusing on-chip data is a promising way to reduce off-chip traffic, the opportunity on reusing shortcut connection data in deep networks (e.g., residual networks) have been largely neglected. Those shortcut data accounts for nearly 40% of the total feature map data. In this paper, we propose Shortcut Mining, a novel approach that “mines” the unexploited opportunity of on-chip data reusing. We introduce the abstraction of logical buffers to address the lack of flexibility in existing buffer architecture, and then propose a sequence of procedures which, collectively, can effectively reuse both shortcut and non-shortcut feature maps. The proposed procedures are also able to reuse shortcut data across any number of intermediate layers without using additional buffer resources. Experiment results from prototyping on FPGAs show that, the proposed Shortcut Mining achieves 53.3%, 58%, and 43% reduction in off-chip feature map traffic for SqueezeNet, ResNet-34, and ResNet152, respectively and a 1.93X increase in throughput compared with a state-of-the-art accelerator.\",\"PeriodicalId\":102050,\"journal\":{\"name\":\"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2019.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators
Off-chip memory traffic has been a major performance bottleneck in deep learning accelerators. While reusing on-chip data is a promising way to reduce off-chip traffic, the opportunity on reusing shortcut connection data in deep networks (e.g., residual networks) have been largely neglected. Those shortcut data accounts for nearly 40% of the total feature map data. In this paper, we propose Shortcut Mining, a novel approach that “mines” the unexploited opportunity of on-chip data reusing. We introduce the abstraction of logical buffers to address the lack of flexibility in existing buffer architecture, and then propose a sequence of procedures which, collectively, can effectively reuse both shortcut and non-shortcut feature maps. The proposed procedures are also able to reuse shortcut data across any number of intermediate layers without using additional buffer resources. Experiment results from prototyping on FPGAs show that, the proposed Shortcut Mining achieves 53.3%, 58%, and 43% reduction in off-chip feature map traffic for SqueezeNet, ResNet-34, and ResNet152, respectively and a 1.93X increase in throughput compared with a state-of-the-art accelerator.