Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation

IF 15.3 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Ieee-Caa Journal of Automatica Sinica Pub Date : 2024-10-08 DOI:10.1109/JAS.2024.124629
Zimo Yin;Jian Pu;Yijie Zhou;Xiangyang Xue
{"title":"Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation","authors":"Zimo Yin;Jian Pu;Yijie Zhou;Xiangyang Xue","doi":"10.1109/JAS.2024.124629","DOIUrl":null,"url":null,"abstract":"Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2270-2283"},"PeriodicalIF":15.3000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10707699/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自我知识提炼中的两阶段定向知识转移法
知识蒸馏(KD)通过从复杂的教师网络中转移暗知识来增强学生网络的泛化能力。为了优化计算支出和内存利用率,自知识蒸馏(SKD)从模型本身而非外部教师网络中提取暗知识。然而,以往的自知蒸馏方法都是不加区分地对完整数据集进行蒸馏,忽略了对代表性样本的分析。在这项工作中,我们提出了一种新颖的两阶段方法,为特定样本提供有针对性的知识,命名为两阶段方法自我知识蒸馏(TOAST)。我们首先使用基于每类对数向量生成的类 medoids 来软化硬目标。然后,我们用过去预测的一半批量数据对训练不足的数据进行迭代蒸馏。两阶段知识线性组合,有效提高了模型性能。在五种骨干架构上进行的广泛实验表明,我们的方法与模型无关,并实现了最佳的泛化性能。此外,TOAST 与现有的基于增强的正则化方法具有很强的兼容性。与最近最先进的方法相比,我们的方法的速度提高了 2.95 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Ieee-Caa Journal of Automatica Sinica
Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering
CiteScore
23.50
自引率
11.00%
发文量
880
期刊介绍: The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.
期刊最新文献
Cas-FNE: Cascaded Face Normal Estimation Front cover Inside back cover Inside front cover Back cover
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1