{"title":"Hardware-Aware NAS Framework with Layer Adaptive Scheduling on Embedded System","authors":"Chuxi Li, Xiaoya Fan, Shengbing Zhang, Zhao Yang, Miao Wang, Danghui Wang, Meng Zhang","doi":"10.1145/3394885.3431536","DOIUrl":null,"url":null,"abstract":"Neural Architecture Search (NAS) has been proven to be an effective solution for building Deep Convolutional Neural Network (DCNN) models automatically. Subsequently, several hardware-aware NAS frameworks incorporate hardware latency into the search objectives to avoid the potential risk that the searched network cannot be deployed on target platforms. However, the mismatch between NAS and hardware persists due to the absent of rethinking the applicability of the searched network layer characteristics and hardware mapping. A convolution neural network layer can be executed on various dataflows of hardware with different performance, with which the characteristics of on-chip data using varies to fit the parallel structure. This mismatch also results in significant performance degradation for some maladaptive layers obtained from NAS, which might achieved a much better latency when the adopted dataflow changes. To address the issue that the network latency is insufficient to evaluate the deployment efficiency, this paper proposes a novel hardware-aware NAS framework in consideration of the adaptability between layers and dataflow patterns. Beside, we develop an optimized layer adaptive data scheduling strategy as well as a coarse-grained reconfigurable computing architecture so as to deploy the searched networks with high power-efficiency by selecting the most appropriate dataflow pattern layer-by-layer under limited resources. Evaluation results show that the proposed NAS framework can search DCNNs with the similar accuracy to the state-of-the-art ones as well as the low inference latency, and the proposed architecture provides both power-efficiency improvement and energy consumption saving.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394885.3431536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Neural Architecture Search (NAS) has been proven to be an effective solution for building Deep Convolutional Neural Network (DCNN) models automatically. Subsequently, several hardware-aware NAS frameworks incorporate hardware latency into the search objectives to avoid the potential risk that the searched network cannot be deployed on target platforms. However, the mismatch between NAS and hardware persists due to the absent of rethinking the applicability of the searched network layer characteristics and hardware mapping. A convolution neural network layer can be executed on various dataflows of hardware with different performance, with which the characteristics of on-chip data using varies to fit the parallel structure. This mismatch also results in significant performance degradation for some maladaptive layers obtained from NAS, which might achieved a much better latency when the adopted dataflow changes. To address the issue that the network latency is insufficient to evaluate the deployment efficiency, this paper proposes a novel hardware-aware NAS framework in consideration of the adaptability between layers and dataflow patterns. Beside, we develop an optimized layer adaptive data scheduling strategy as well as a coarse-grained reconfigurable computing architecture so as to deploy the searched networks with high power-efficiency by selecting the most appropriate dataflow pattern layer-by-layer under limited resources. Evaluation results show that the proposed NAS framework can search DCNNs with the similar accuracy to the state-of-the-art ones as well as the low inference latency, and the proposed architecture provides both power-efficiency improvement and energy consumption saving.