DNN模型压缩和推理加速的混合优化

N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur
{"title":"DNN模型压缩和推理加速的混合优化","authors":"N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur","doi":"10.1109/CONIT55038.2022.9847977","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Optimization for DNN Model Compression and Inference Acceleration\",\"authors\":\"N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur\",\"doi\":\"10.1109/CONIT55038.2022.9847977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.\",\"PeriodicalId\":270445,\"journal\":{\"name\":\"2022 2nd International Conference on Intelligent Technologies (CONIT)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Intelligent Technologies (CONIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONIT55038.2022.9847977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9847977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

深度神经网络以其在计算机视觉、自然语言处理、语音识别、模式识别等领域的应用而闻名。尽管这些模型非常强大,但它们消耗了相当多的内存带宽、存储和其他计算资源。这些繁重的模型可以在支持CPU/GPU/TPU的机器上成功执行。嵌入式设备很难执行它们,因为它们在计算上受到限制。为了简化这些模型在嵌入式设备上的部署,我们需要对它们进行优化。模型的优化是指在不影响模型精度、失败次数和模型参数等性能的情况下减小模型尺寸。我们提出了一种混合优化方法来解决这个问题。混合优化是一个两阶段的技术,剪枝,然后量化。修剪是为了减小模型尺寸而去除不必要的权值和连接的过程。去掉不必要的参数后,将剩余参数的权重转换为8位整数值,称为模型的量化。我们在CIFAR-10数据集上验证了这种混合优化技术在图像分类任务中的性能。本文对ResNet56、ResNet110和GoogleNet三个权重模型进行了混合优化处理。平均而言,flops和参数的数量差异为40%。参数和失效数的减少对模型性能的影响可以忽略不计,精度的变化小于2%。此外,优化后的模型已部署在边缘设备和嵌入式平台NVIDIA Jetson TX2 Module上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Hybrid Optimization for DNN Model Compression and Inference Acceleration
Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Software Bug Prediction and Tracing Models from a Statistical Perspective Using Machine Learning Design & Simulation of a High Frequency Rectifier Using Operational Amplifier Brain Tumor Detection Application Based On Convolutional Neural Network Classification of Brain Tumor Into Four Categories Using Convolution Neural Network Comparison of Variants of Yen's Algorithm for Finding K-Simple Shortest Paths
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1