Swagath Venkataramani, Jungwook Choi, V. Srinivasan, K. Gopalakrishnan, Leland Chang
{"title":"海报:共享内存加速器上深度神经网络性能优化的设计空间探索","authors":"Swagath Venkataramani, Jungwook Choi, V. Srinivasan, K. Gopalakrishnan, Leland Chang","doi":"10.1109/PACT.2017.39","DOIUrl":null,"url":null,"abstract":"The growing prominence and computational challenges imposed by Deep Neural Networks (DNNs) has fueled the design of specialized accelerator architectures and associated dataflows to improve their implementation efficiency. Each of these solutions serve as a datapoint on the throughput vs. energy trade-offs for a given DNN and a set of architectural constraints. In this paper, we set out to explore whether it is possible to systematically explore the design space so as to estimate a given DNN's (both inference and training) performance on an shared memory architecture specification using a variety of data-flows. To this end, we have developed a framework, DEEPMATRIX, which given a description of a DNN and a hardware architecture, automatically identifies how the computations of the DNN's layers need to partitioned and mapped on to the architecture such that the overall performance is maximized, while meeting the constraints imposed by the hardware (processing power, memory capacity, bandwidth etc.) We demonstrate DEEPMATRIX's effectiveness for the VGG DNN benchmark, showing the trade-offs and sensitivity of utilization based on different architecture constraints.","PeriodicalId":438103,"journal":{"name":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators\",\"authors\":\"Swagath Venkataramani, Jungwook Choi, V. Srinivasan, K. Gopalakrishnan, Leland Chang\",\"doi\":\"10.1109/PACT.2017.39\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing prominence and computational challenges imposed by Deep Neural Networks (DNNs) has fueled the design of specialized accelerator architectures and associated dataflows to improve their implementation efficiency. Each of these solutions serve as a datapoint on the throughput vs. energy trade-offs for a given DNN and a set of architectural constraints. In this paper, we set out to explore whether it is possible to systematically explore the design space so as to estimate a given DNN's (both inference and training) performance on an shared memory architecture specification using a variety of data-flows. To this end, we have developed a framework, DEEPMATRIX, which given a description of a DNN and a hardware architecture, automatically identifies how the computations of the DNN's layers need to partitioned and mapped on to the architecture such that the overall performance is maximized, while meeting the constraints imposed by the hardware (processing power, memory capacity, bandwidth etc.) We demonstrate DEEPMATRIX's effectiveness for the VGG DNN benchmark, showing the trade-offs and sensitivity of utilization based on different architecture constraints.\",\"PeriodicalId\":438103,\"journal\":{\"name\":\"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2017.39\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2017.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators
The growing prominence and computational challenges imposed by Deep Neural Networks (DNNs) has fueled the design of specialized accelerator architectures and associated dataflows to improve their implementation efficiency. Each of these solutions serve as a datapoint on the throughput vs. energy trade-offs for a given DNN and a set of architectural constraints. In this paper, we set out to explore whether it is possible to systematically explore the design space so as to estimate a given DNN's (both inference and training) performance on an shared memory architecture specification using a variety of data-flows. To this end, we have developed a framework, DEEPMATRIX, which given a description of a DNN and a hardware architecture, automatically identifies how the computations of the DNN's layers need to partitioned and mapped on to the architecture such that the overall performance is maximized, while meeting the constraints imposed by the hardware (processing power, memory capacity, bandwidth etc.) We demonstrate DEEPMATRIX's effectiveness for the VGG DNN benchmark, showing the trade-offs and sensitivity of utilization based on different architecture constraints.