DIST+: Knowledge Distillation From a Stronger Adaptive Teacher

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-03-26 DOI:10.1109/TPAMI.2025.3554235

Tao Huang;Shan You;Fei Wang;Chen Qian;Chang Xu

{"title":"DIST+: Knowledge Distillation From a Stronger Adaptive Teacher","authors":"Tao Huang;Shan You;Fei Wang;Chen Qian;Chang Xu","doi":"10.1109/TPAMI.2025.3554235","DOIUrl":null,"url":null,"abstract":"The paper introduces DIST, an innovative knowledge distillation method that excels in learning from a superior teacher model. DIST differentiates itself from conventional techniques by adeptly handling the often significant prediction discrepancies between the student and teacher models. It achieves this by focusing on maintaining the relationships between their predictions, implementing a correlation-based loss to explicitly capture the teacher's intrinsic inter-class relations. Moreover, DIST uniquely considers the semantic similarities between different instances and each class at the intra-class level. The method is further enhanced by two significant improvements: (1) A teacher acclimation strategy, which effectively reduces the discrepancy between teacher and student, thereby optimizing the distillation process. (2) An extension of the DIST loss from the logit level to the feature level, a modification that proves especially beneficial for dense prediction tasks. DIST stands out for its simplicity, practicality, and adaptability to various architectures, model sizes, and training strategies. It consistently delivers state-of-the-art results across a range of applications, including image classification, object detection, and semantic segmentation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5571-5585"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10938241/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The paper introduces DIST, an innovative knowledge distillation method that excels in learning from a superior teacher model. DIST differentiates itself from conventional techniques by adeptly handling the often significant prediction discrepancies between the student and teacher models. It achieves this by focusing on maintaining the relationships between their predictions, implementing a correlation-based loss to explicitly capture the teacher's intrinsic inter-class relations. Moreover, DIST uniquely considers the semantic similarities between different instances and each class at the intra-class level. The method is further enhanced by two significant improvements: (1) A teacher acclimation strategy, which effectively reduces the discrepancy between teacher and student, thereby optimizing the distillation process. (2) An extension of the DIST loss from the logit level to the feature level, a modification that proves especially beneficial for dense prediction tasks. DIST stands out for its simplicity, practicality, and adaptability to various architectures, model sizes, and training strategies. It consistently delivers state-of-the-art results across a range of applications, including image classification, object detection, and semantic segmentation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DIST+：知识升华来自一个更强的适应性教师

本文介绍了一种创新的知识提炼方法DIST，它擅长于向优秀教师学习。DIST与传统技术的不同之处在于，它能熟练地处理学生模型和教师模型之间往往显著的预测差异。它通过专注于维持他们的预测之间的关系来实现这一目标，实现基于相关性的损失，以明确地捕捉教师内在的班级间关系。此外，DIST在类内级别上独特地考虑了不同实例和每个类之间的语义相似性。该方法得到了两项显著改进：(1)教师驯化策略，有效减少了师生之间的差异，从而优化了蒸馏过程。(2)将DIST损失从logit级扩展到feature级，这种改进被证明特别有利于密集预测任务。DIST以其简单性、实用性和对各种体系结构、模型大小和训练策略的适应性而突出。它始终如一地在一系列应用中提供最先进的结果，包括图像分类、对象检测和语义分割。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量