教师模型到多学生模型的高效知识升华

Thrivikram Gl, Vidya Ganesh, T. Sethuraman, Satheesh K. Perepu
{"title":"教师模型到多学生模型的高效知识升华","authors":"Thrivikram Gl, Vidya Ganesh, T. Sethuraman, Satheesh K. Perepu","doi":"10.1109/IAICT52856.2021.9532543","DOIUrl":null,"url":null,"abstract":"Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.","PeriodicalId":416542,"journal":{"name":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient knowledge distillation of teacher model to multiple student models\",\"authors\":\"Thrivikram Gl, Vidya Ganesh, T. Sethuraman, Satheesh K. Perepu\",\"doi\":\"10.1109/IAICT52856.2021.9532543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.\",\"PeriodicalId\":416542,\"journal\":{\"name\":\"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAICT52856.2021.9532543\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT52856.2021.9532543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

深度学习模型被证明在训练输入特征集和不同任务输出之间的复杂非线性关系方面提供了令人满意的结果。然而,它们是内存密集型的,并且需要良好的计算能力来进行训练和推理。在文献中,人们可以找到不同的模型压缩技术,这些技术可以在边缘设备上轻松部署。知识蒸馏就是将复杂的教师模型中的知识转移到低参数的学生模型中的一种方法。然而,限制是学生模型的架构应该与复杂的教师模型相比较,以便更好地进行知识转移。由于这个限制,我们无法在边缘设备上部署这个从复杂而庞大的老师那里学习的学生模型。在这项工作中,我们建议使用一种组合的学生方法,其中不同的学生模型从一个共同的教师模型中学习。此外,我们提出了一个独特的损失函数,可以同时训练多个学生模型。这种方法的一个优点是,与传统的单一学生模型和复杂的教师模型相比,这些学生模型可以尽可能地简单。最后,我们提供了一个广泛的评估,以证明我们的方法可以显着提高整体精度,并且与通用模型相比,可以进一步压缩10%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient knowledge distillation of teacher model to multiple student models
Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Wi-Fi CSI Based Human Sign Language Recognition using LSTM Network Effect of Antenna Power Roll-Off on Performance and Coverage of 4G Cellular Network from High Altitude Platforms Virtual Reality Experience in Tourism: A Factor Analysis Assessment Design of Integrated Control System Based On IoT With Context Aware Method In Hydroponic Plants Stability Control for Bipedal Robot in Standing and Walking using Fuzzy Logic Controller
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1