{"title":"DIST+: Knowledge Distillation From a Stronger Adaptive Teacher","authors":"Tao Huang;Shan You;Fei Wang;Chen Qian;Chang Xu","doi":"10.1109/TPAMI.2025.3554235","DOIUrl":null,"url":null,"abstract":"The paper introduces DIST, an innovative knowledge distillation method that excels in learning from a superior teacher model. DIST differentiates itself from conventional techniques by adeptly handling the often significant prediction discrepancies between the student and teacher models. It achieves this by focusing on maintaining the relationships between their predictions, implementing a correlation-based loss to explicitly capture the teacher's intrinsic inter-class relations. Moreover, DIST uniquely considers the semantic similarities between different instances and each class at the intra-class level. The method is further enhanced by two significant improvements: (1) A teacher acclimation strategy, which effectively reduces the discrepancy between teacher and student, thereby optimizing the distillation process. (2) An extension of the DIST loss from the logit level to the feature level, a modification that proves especially beneficial for dense prediction tasks. DIST stands out for its simplicity, practicality, and adaptability to various architectures, model sizes, and training strategies. It consistently delivers state-of-the-art results across a range of applications, including image classification, object detection, and semantic segmentation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5571-5585"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10938241/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The paper introduces DIST, an innovative knowledge distillation method that excels in learning from a superior teacher model. DIST differentiates itself from conventional techniques by adeptly handling the often significant prediction discrepancies between the student and teacher models. It achieves this by focusing on maintaining the relationships between their predictions, implementing a correlation-based loss to explicitly capture the teacher's intrinsic inter-class relations. Moreover, DIST uniquely considers the semantic similarities between different instances and each class at the intra-class level. The method is further enhanced by two significant improvements: (1) A teacher acclimation strategy, which effectively reduces the discrepancy between teacher and student, thereby optimizing the distillation process. (2) An extension of the DIST loss from the logit level to the feature level, a modification that proves especially beneficial for dense prediction tasks. DIST stands out for its simplicity, practicality, and adaptability to various architectures, model sizes, and training strategies. It consistently delivers state-of-the-art results across a range of applications, including image classification, object detection, and semantic segmentation.