DIST+: Knowledge Distillation From a Stronger Adaptive Teacher

Tao Huang;Shan You;Fei Wang;Chen Qian;Chang Xu
{"title":"DIST+: Knowledge Distillation From a Stronger Adaptive Teacher","authors":"Tao Huang;Shan You;Fei Wang;Chen Qian;Chang Xu","doi":"10.1109/TPAMI.2025.3554235","DOIUrl":null,"url":null,"abstract":"The paper introduces DIST, an innovative knowledge distillation method that excels in learning from a superior teacher model. DIST differentiates itself from conventional techniques by adeptly handling the often significant prediction discrepancies between the student and teacher models. It achieves this by focusing on maintaining the relationships between their predictions, implementing a correlation-based loss to explicitly capture the teacher's intrinsic inter-class relations. Moreover, DIST uniquely considers the semantic similarities between different instances and each class at the intra-class level. The method is further enhanced by two significant improvements: (1) A teacher acclimation strategy, which effectively reduces the discrepancy between teacher and student, thereby optimizing the distillation process. (2) An extension of the DIST loss from the logit level to the feature level, a modification that proves especially beneficial for dense prediction tasks. DIST stands out for its simplicity, practicality, and adaptability to various architectures, model sizes, and training strategies. It consistently delivers state-of-the-art results across a range of applications, including image classification, object detection, and semantic segmentation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5571-5585"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10938241/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The paper introduces DIST, an innovative knowledge distillation method that excels in learning from a superior teacher model. DIST differentiates itself from conventional techniques by adeptly handling the often significant prediction discrepancies between the student and teacher models. It achieves this by focusing on maintaining the relationships between their predictions, implementing a correlation-based loss to explicitly capture the teacher's intrinsic inter-class relations. Moreover, DIST uniquely considers the semantic similarities between different instances and each class at the intra-class level. The method is further enhanced by two significant improvements: (1) A teacher acclimation strategy, which effectively reduces the discrepancy between teacher and student, thereby optimizing the distillation process. (2) An extension of the DIST loss from the logit level to the feature level, a modification that proves especially beneficial for dense prediction tasks. DIST stands out for its simplicity, practicality, and adaptability to various architectures, model sizes, and training strategies. It consistently delivers state-of-the-art results across a range of applications, including image classification, object detection, and semantic segmentation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DIST+:知识升华来自一个更强的适应性教师
本文介绍了一种创新的知识提炼方法DIST,它擅长于向优秀教师学习。DIST与传统技术的不同之处在于,它能熟练地处理学生模型和教师模型之间往往显著的预测差异。它通过专注于维持他们的预测之间的关系来实现这一目标,实现基于相关性的损失,以明确地捕捉教师内在的班级间关系。此外,DIST在类内级别上独特地考虑了不同实例和每个类之间的语义相似性。该方法得到了两项显著改进:(1)教师驯化策略,有效减少了师生之间的差异,从而优化了蒸馏过程。(2)将DIST损失从logit级扩展到feature级,这种改进被证明特别有利于密集预测任务。DIST以其简单性、实用性和对各种体系结构、模型大小和训练策略的适应性而突出。它始终如一地在一系列应用中提供最先进的结果,包括图像分类、对象检测和语义分割。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spike Camera Optical Flow Estimation Based on Continuous Spike Streams. Bi-C2R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification. Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement. Principled Multimodal Representation Learning. Class-Distribution-Aware Pseudo-Labeling for Semi-Supervised Multi-Label Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1