{"title":"Channel-Wise Attention and Channel Combination for Knowledge Distillation","authors":"C. Han, K. Lee","doi":"10.1145/3400286.3418273","DOIUrl":null,"url":null,"abstract":"Knowledge distillation is a strategy to build machine learning models efficiently by making use of knowledge embedded in a pretrained model. Teacher-student framework is a well-known one to use knowledge distillation, where a teacher network usually contains knowledge for a specific task and a student network is constructed in a simpler architecture inheriting the knowledge of the teacher network. This paper proposes a new approach that uses an attention mechanism to extract knowledge from a teacher network. The attention function plays the role of determining which channels of feature maps in the teacher network to be used for training the student network so that the student network can only learn useful features. This approach allows a new model to learn useful features considering the model complexity.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"485 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400286.3418273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge distillation is a strategy to build machine learning models efficiently by making use of knowledge embedded in a pretrained model. Teacher-student framework is a well-known one to use knowledge distillation, where a teacher network usually contains knowledge for a specific task and a student network is constructed in a simpler architecture inheriting the knowledge of the teacher network. This paper proposes a new approach that uses an attention mechanism to extract knowledge from a teacher network. The attention function plays the role of determining which channels of feature maps in the teacher network to be used for training the student network so that the student network can only learn useful features. This approach allows a new model to learn useful features considering the model complexity.