自我知识提炼中的两阶段定向知识转移法

IF 15.3 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Ieee-Caa Journal of Automatica Sinica Pub Date : 2024-10-08 DOI:10.1109/JAS.2024.124629

Zimo Yin;Jian Pu;Yijie Zhou;Xiangyang Xue

{"title":"自我知识提炼中的两阶段定向知识转移法","authors":"Zimo Yin;Jian Pu;Yijie Zhou;Xiangyang Xue","doi":"10.1109/JAS.2024.124629","DOIUrl":null,"url":null,"abstract":"Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2270-2283"},"PeriodicalIF":15.3000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation\",\"authors\":\"Zimo Yin;Jian Pu;Yijie Zhou;Xiangyang Xue\",\"doi\":\"10.1109/JAS.2024.124629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.\",\"PeriodicalId\":54230,\"journal\":{\"name\":\"Ieee-Caa Journal of Automatica Sinica\",\"volume\":\"11 11\",\"pages\":\"2270-2283\"},\"PeriodicalIF\":15.3000,\"publicationDate\":\"2024-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ieee-Caa Journal of Automatica Sinica\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10707699/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10707699/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

知识蒸馏（KD）通过从复杂的教师网络中转移暗知识来增强学生网络的泛化能力。为了优化计算支出和内存利用率，自知识蒸馏（SKD）从模型本身而非外部教师网络中提取暗知识。然而，以往的自知蒸馏方法都是不加区分地对完整数据集进行蒸馏，忽略了对代表性样本的分析。在这项工作中，我们提出了一种新颖的两阶段方法，为特定样本提供有针对性的知识，命名为两阶段方法自我知识蒸馏（TOAST）。我们首先使用基于每类对数向量生成的类 medoids 来软化硬目标。然后，我们用过去预测的一半批量数据对训练不足的数据进行迭代蒸馏。两阶段知识线性组合，有效提高了模型性能。在五种骨干架构上进行的广泛实验表明，我们的方法与模型无关，并实现了最佳的泛化性能。此外，TOAST 与现有的基于增强的正则化方法具有很强的兼容性。与最近最先进的方法相比，我们的方法的速度提高了 2.95 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation

Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.

期刊最新文献

Cas-FNE: Cascaded Face Normal Estimation Front cover Inside back cover Inside front cover Back cover