Hardware-Aware NAS Framework with Layer Adaptive Scheduling on Embedded System

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI:10.1145/3394885.3431536

Chuxi Li, Xiaoya Fan, Shengbing Zhang, Zhao Yang, Miao Wang, Danghui Wang, Meng Zhang

{"title":"Hardware-Aware NAS Framework with Layer Adaptive Scheduling on Embedded System","authors":"Chuxi Li, Xiaoya Fan, Shengbing Zhang, Zhao Yang, Miao Wang, Danghui Wang, Meng Zhang","doi":"10.1145/3394885.3431536","DOIUrl":null,"url":null,"abstract":"Neural Architecture Search (NAS) has been proven to be an effective solution for building Deep Convolutional Neural Network (DCNN) models automatically. Subsequently, several hardware-aware NAS frameworks incorporate hardware latency into the search objectives to avoid the potential risk that the searched network cannot be deployed on target platforms. However, the mismatch between NAS and hardware persists due to the absent of rethinking the applicability of the searched network layer characteristics and hardware mapping. A convolution neural network layer can be executed on various dataflows of hardware with different performance, with which the characteristics of on-chip data using varies to fit the parallel structure. This mismatch also results in significant performance degradation for some maladaptive layers obtained from NAS, which might achieved a much better latency when the adopted dataflow changes. To address the issue that the network latency is insufficient to evaluate the deployment efficiency, this paper proposes a novel hardware-aware NAS framework in consideration of the adaptability between layers and dataflow patterns. Beside, we develop an optimized layer adaptive data scheduling strategy as well as a coarse-grained reconfigurable computing architecture so as to deploy the searched networks with high power-efficiency by selecting the most appropriate dataflow pattern layer-by-layer under limited resources. Evaluation results show that the proposed NAS framework can search DCNNs with the similar accuracy to the state-of-the-art ones as well as the low inference latency, and the proposed architecture provides both power-efficiency improvement and energy consumption saving.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394885.3431536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Neural Architecture Search (NAS) has been proven to be an effective solution for building Deep Convolutional Neural Network (DCNN) models automatically. Subsequently, several hardware-aware NAS frameworks incorporate hardware latency into the search objectives to avoid the potential risk that the searched network cannot be deployed on target platforms. However, the mismatch between NAS and hardware persists due to the absent of rethinking the applicability of the searched network layer characteristics and hardware mapping. A convolution neural network layer can be executed on various dataflows of hardware with different performance, with which the characteristics of on-chip data using varies to fit the parallel structure. This mismatch also results in significant performance degradation for some maladaptive layers obtained from NAS, which might achieved a much better latency when the adopted dataflow changes. To address the issue that the network latency is insufficient to evaluate the deployment efficiency, this paper proposes a novel hardware-aware NAS framework in consideration of the adaptability between layers and dataflow patterns. Beside, we develop an optimized layer adaptive data scheduling strategy as well as a coarse-grained reconfigurable computing architecture so as to deploy the searched networks with high power-efficiency by selecting the most appropriate dataflow pattern layer-by-layer under limited resources. Evaluation results show that the proposed NAS framework can search DCNNs with the similar accuracy to the state-of-the-art ones as well as the low inference latency, and the proposed architecture provides both power-efficiency improvement and energy consumption saving.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

嵌入式系统中具有硬件感知的分层自适应调度的NAS框架

神经结构搜索(NAS)已被证明是自动构建深度卷积神经网络(DCNN)模型的有效解决方案。随后，几个硬件感知的NAS框架将硬件延迟合并到搜索目标中，以避免搜索网络无法部署到目标平台上的潜在风险。然而，由于没有重新思考搜索到的网络层特征和硬件映射的适用性，NAS和硬件之间的不匹配仍然存在。一个卷积神经网络层可以在不同性能的硬件的各种数据流上执行，这些数据流使用的片上数据特征不同，以适应并行结构。这种不匹配还会导致从NAS获得的一些自适应不良层的显著性能下降，当采用的数据流发生变化时，这些层可能会获得更好的延迟。为了解决网络延迟不足以评估部署效率的问题，本文提出了一种考虑层间适应性和数据流模式的硬件感知NAS框架。此外，我们开发了一种优化的层自适应数据调度策略和粗粒度可重构计算架构，以便在有限资源下逐层选择最合适的数据流模式，以高能效部署搜索到的网络。评估结果表明，所提出的NAS框架能够以与最先进的NAS框架相似的精度搜索DCNNs，并且具有较低的推理延迟，所提出的架构既提高了功耗，又节省了能耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量