Boosting accuracy of student models via Masked Adaptive Self-Distillation

IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2025-07-07 Epub Date: 2025-03-26 DOI:10.1016/j.neucom.2025.129988
Haoran Zhao , Shuwen Tian , Jinlong Wang , Zhaopeng Deng , Xin Sun , Junyu Dong
{"title":"Boosting accuracy of student models via Masked Adaptive Self-Distillation","authors":"Haoran Zhao ,&nbsp;Shuwen Tian ,&nbsp;Jinlong Wang ,&nbsp;Zhaopeng Deng ,&nbsp;Xin Sun ,&nbsp;Junyu Dong","doi":"10.1016/j.neucom.2025.129988","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge distillation (KD) has achieved impressive success, yet conventional KD approaches are time-consuming and computationally costly. In contrast, self-distillation methods provide an efficient alternative. However, existing self-distillation methods mostly suffer from information redundancy due to the same network architecture from the teacher and student models. Additionally, they simultaneously face the inherent limitation of lacking a high-capacity teacher model. To cope with the above challenges, we propose a novel and efficient method named Masked Adaptive Self-Distillation (MASD). Specifically, we first introduce the Mask Generation Module, which masks random pixels of the feature maps and force it to reconstruct and refine more valuable features on different layers. Moreover, the Adaptive Weighting Mechanism is designed to dynamically adjust and optimize the weights of supervisory signals utilizing the probabilities from the mutual masked supervisory signals, thereby compensating the absence of high-capacity teacher model. We demonstrate the effectiveness of our MASD method on conventional image classification datasets and fine-grained datasets using state-of-the-art CNN architectures, and show that MASD significantly enhances the generalization of various backbone networks. For instance, on the CIFAR-100 classification benchmark, the proposed MASD method achieves an accuracy of 80.40% with the ResNet-18 architecture, surpassing the baseline with a 4.16% margin in Top-1 accuracy.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"637 ","pages":"Article 129988"},"PeriodicalIF":6.5000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225006605","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Knowledge distillation (KD) has achieved impressive success, yet conventional KD approaches are time-consuming and computationally costly. In contrast, self-distillation methods provide an efficient alternative. However, existing self-distillation methods mostly suffer from information redundancy due to the same network architecture from the teacher and student models. Additionally, they simultaneously face the inherent limitation of lacking a high-capacity teacher model. To cope with the above challenges, we propose a novel and efficient method named Masked Adaptive Self-Distillation (MASD). Specifically, we first introduce the Mask Generation Module, which masks random pixels of the feature maps and force it to reconstruct and refine more valuable features on different layers. Moreover, the Adaptive Weighting Mechanism is designed to dynamically adjust and optimize the weights of supervisory signals utilizing the probabilities from the mutual masked supervisory signals, thereby compensating the absence of high-capacity teacher model. We demonstrate the effectiveness of our MASD method on conventional image classification datasets and fine-grained datasets using state-of-the-art CNN architectures, and show that MASD significantly enhances the generalization of various backbone networks. For instance, on the CIFAR-100 classification benchmark, the proposed MASD method achieves an accuracy of 80.40% with the ResNet-18 architecture, surpassing the baseline with a 4.16% margin in Top-1 accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过掩模自适应自蒸馏提高学生模型的准确性
知识蒸馏(Knowledge distillation, KD)已经取得了令人印象深刻的成功,然而传统的知识蒸馏方法耗时且计算成本高。相比之下,自蒸馏方法提供了一种有效的替代方法。然而,现有的自蒸馏方法由于师生模型的网络结构相同,存在信息冗余的问题。此外,他们同时面临缺乏高能力教师模式的固有局限性。为了应对上述挑战,我们提出了一种新的高效方法——掩膜自适应蒸馏(MASD)。具体来说,我们首先介绍了掩码生成模块,该模块可以掩码特征映射的随机像素,并强制其在不同层上重建和细化更有价值的特征。此外,设计了自适应加权机制,利用相互屏蔽的监督信号的概率动态调整和优化监督信号的权重,从而弥补高容量教师模型的不足。我们证明了我们的MASD方法在使用最先进的CNN架构的传统图像分类数据集和细粒度数据集上的有效性,并表明MASD显着增强了各种骨干网络的泛化。例如,在CIFAR-100分类基准上,本文提出的MASD方法在ResNet-18架构下的准确率达到80.40%,在Top-1准确率上超过基线4.16%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
ms-mamba: Multi-scale mamba for time-series forecasting Advances in intelligent animal pose tracking for neuro-behavioral integration Impact of leakage on data harmonization in machine learning pipelines in class imbalance across sites Blind motion deblurring via adaptive frequency-aware and ternary interactive attention fusion Lightweight ensemble vision transformer framework for non-invasive survival prediction in glioblastoma
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1