捷径挖掘:利用DCNN加速器中的跨层捷径重用

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-02-01 DOI:10.1109/HPCA.2019.00030

Arash AziziMazreah, Lizhong Chen

{"title":"捷径挖掘:利用DCNN加速器中的跨层捷径重用","authors":"Arash AziziMazreah, Lizhong Chen","doi":"10.1109/HPCA.2019.00030","DOIUrl":null,"url":null,"abstract":"Off-chip memory traffic has been a major performance bottleneck in deep learning accelerators. While reusing on-chip data is a promising way to reduce off-chip traffic, the opportunity on reusing shortcut connection data in deep networks (e.g., residual networks) have been largely neglected. Those shortcut data accounts for nearly 40% of the total feature map data. In this paper, we propose Shortcut Mining, a novel approach that “mines” the unexploited opportunity of on-chip data reusing. We introduce the abstraction of logical buffers to address the lack of flexibility in existing buffer architecture, and then propose a sequence of procedures which, collectively, can effectively reuse both shortcut and non-shortcut feature maps. The proposed procedures are also able to reuse shortcut data across any number of intermediate layers without using additional buffer resources. Experiment results from prototyping on FPGAs show that, the proposed Shortcut Mining achieves 53.3%, 58%, and 43% reduction in off-chip feature map traffic for SqueezeNet, ResNet-34, and ResNet152, respectively and a 1.93X increase in throughput compared with a state-of-the-art accelerator.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators\",\"authors\":\"Arash AziziMazreah, Lizhong Chen\",\"doi\":\"10.1109/HPCA.2019.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Off-chip memory traffic has been a major performance bottleneck in deep learning accelerators. While reusing on-chip data is a promising way to reduce off-chip traffic, the opportunity on reusing shortcut connection data in deep networks (e.g., residual networks) have been largely neglected. Those shortcut data accounts for nearly 40% of the total feature map data. In this paper, we propose Shortcut Mining, a novel approach that “mines” the unexploited opportunity of on-chip data reusing. We introduce the abstraction of logical buffers to address the lack of flexibility in existing buffer architecture, and then propose a sequence of procedures which, collectively, can effectively reuse both shortcut and non-shortcut feature maps. The proposed procedures are also able to reuse shortcut data across any number of intermediate layers without using additional buffer resources. Experiment results from prototyping on FPGAs show that, the proposed Shortcut Mining achieves 53.3%, 58%, and 43% reduction in off-chip feature map traffic for SqueezeNet, ResNet-34, and ResNet152, respectively and a 1.93X increase in throughput compared with a state-of-the-art accelerator.\",\"PeriodicalId\":102050,\"journal\":{\"name\":\"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2019.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

虽然重用片上数据是减少片外流量的一种有前途的方法，但在深度网络(例如残余网络)中重用快捷连接数据的机会在很大程度上被忽视了。这些快捷方式数据占全部特征图数据的近40%。在本文中，我们提出了“捷径挖掘”，这是一种“挖掘”芯片上数据重用未开发机会的新方法。我们引入了逻辑缓冲区的抽象来解决现有缓冲区架构缺乏灵活性的问题，然后提出了一系列过程，这些过程可以有效地重用快捷和非快捷特征映射。所建议的过程还能够跨任意数量的中间层重用快捷数据，而无需使用额外的缓冲区资源。fpga上的原型实验结果表明，所提出的捷径挖掘分别为SqueezeNet、ResNet-34和ResNet152减少了53.3%、58%和43%的片外特征映射流量，与最先进的加速器相比，吞吐量增加了1.93倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators

Off-chip memory traffic has been a major performance bottleneck in deep learning accelerators. While reusing on-chip data is a promising way to reduce off-chip traffic, the opportunity on reusing shortcut connection data in deep networks (e.g., residual networks) have been largely neglected. Those shortcut data accounts for nearly 40% of the total feature map data. In this paper, we propose Shortcut Mining, a novel approach that “mines” the unexploited opportunity of on-chip data reusing. We introduce the abstraction of logical buffers to address the lack of flexibility in existing buffer architecture, and then propose a sequence of procedures which, collectively, can effectively reuse both shortcut and non-shortcut feature maps. The proposed procedures are also able to reuse shortcut data across any number of intermediate layers without using additional buffer resources. Experiment results from prototyping on FPGAs show that, the proposed Shortcut Mining achieves 53.3%, 58%, and 43% reduction in off-chip feature map traffic for SqueezeNet, ResNet-34, and ResNet152, respectively and a 1.93X increase in throughput compared with a state-of-the-art accelerator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量

期刊最新文献

Machine Learning at Facebook: Understanding Inference at the Edge Understanding the Future of Energy Efficiency in Multi-Module GPUs POWERT Channels: A Novel Class of Covert CommunicationExploiting Power Management Vulnerabilities The Accelerator Wall: Limits of Chip Specialization Featherlight Reuse-Distance Measurement