{"title":"Design space exploration for layer-parallel execution of convolutional neural networks on CGRAs","authors":"C. Heidorn, Frank Hannig, J. Teich","doi":"10.1145/3378678.3391878","DOIUrl":null,"url":null,"abstract":"In this work, we systematically explore the design space of throughput, energy, and hardware costs for layer-parallel mappings of Convolutional Neural Networks (CNNs) onto coarse-grained reconfigurable arrays (CGRAs). We derive an analytical model that computes the required resources (processing elements) and buffer memory and thus hardware cost C to sustain a given throughput T as well as the resulting overall energy consumption E for inference. Further, we propose an efficient design space exploration (DSE) to determine the fronts of Pareto-optimal (T,E,C) solutions. This exploration helps to determine the limits of scalability of the presented tiled CGRA accelerator architectures in terms of throughput, the number of parallel layers that can be simultaneously processed, and memory requirements. Finally, we provide an evaluation of energy savings achievable on our architecture in comparison to implementations that execute sequentially a CNN layer-by-layer. In experiments, it is shown that layer-parallel processing is able to reduce energy consumption E by 3.6X, hardware cost C by 1.2X, and increase the achievable throughput T by 6.2X for MobileNet.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378678.3391878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In this work, we systematically explore the design space of throughput, energy, and hardware costs for layer-parallel mappings of Convolutional Neural Networks (CNNs) onto coarse-grained reconfigurable arrays (CGRAs). We derive an analytical model that computes the required resources (processing elements) and buffer memory and thus hardware cost C to sustain a given throughput T as well as the resulting overall energy consumption E for inference. Further, we propose an efficient design space exploration (DSE) to determine the fronts of Pareto-optimal (T,E,C) solutions. This exploration helps to determine the limits of scalability of the presented tiled CGRA accelerator architectures in terms of throughput, the number of parallel layers that can be simultaneously processed, and memory requirements. Finally, we provide an evaluation of energy savings achievable on our architecture in comparison to implementations that execute sequentially a CNN layer-by-layer. In experiments, it is shown that layer-parallel processing is able to reduce energy consumption E by 3.6X, hardware cost C by 1.2X, and increase the achievable throughput T by 6.2X for MobileNet.