Jeet Dutta, Swarnava Dey, Arijit Mukherjee, Arpan Pal
{"title":"Acceleration-aware, Retraining-free Evolutionary Pruning for Automated Fitment of Deep Learning Models on Edge Devices","authors":"Jeet Dutta, Swarnava Dey, Arijit Mukherjee, Arpan Pal","doi":"10.1145/3564121.3564133","DOIUrl":null,"url":null,"abstract":"Deep Learning architectures used in computer vision, natural language and speech processing, unsupervised clustering, etc. have become highly complex and application-specific in recent times. Despite existing automated feature engineering techniques, building such complex models still requires extensive domain knowledge or a huge infrastructure for employing techniques such as Neural Architecture Search (NAS). Further, many industrial applications need in-premises decision-making close to sensors, thus making deployment of deep learning models on edge devices a desirable and often necessary option. Instead of freshly designing application-specific Deep Learning models, the transformation of already built models can achieve faster time to market and cost reduction. In this work, we present an efficient re-training-free model compression method that searches for the best hyper-parameters to reduce the model size and latency without losing any accuracy. Moreover, our proposed method takes into account any drop in accuracy due to hardware acceleration, when a Deep Neural Network is executed on accelerator hardware.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564121.3564133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Deep Learning architectures used in computer vision, natural language and speech processing, unsupervised clustering, etc. have become highly complex and application-specific in recent times. Despite existing automated feature engineering techniques, building such complex models still requires extensive domain knowledge or a huge infrastructure for employing techniques such as Neural Architecture Search (NAS). Further, many industrial applications need in-premises decision-making close to sensors, thus making deployment of deep learning models on edge devices a desirable and often necessary option. Instead of freshly designing application-specific Deep Learning models, the transformation of already built models can achieve faster time to market and cost reduction. In this work, we present an efficient re-training-free model compression method that searches for the best hyper-parameters to reduce the model size and latency without losing any accuracy. Moreover, our proposed method takes into account any drop in accuracy due to hardware acceleration, when a Deep Neural Network is executed on accelerator hardware.