{"title":"知识升华的渠道关注与渠道组合","authors":"C. Han, K. Lee","doi":"10.1145/3400286.3418273","DOIUrl":null,"url":null,"abstract":"Knowledge distillation is a strategy to build machine learning models efficiently by making use of knowledge embedded in a pretrained model. Teacher-student framework is a well-known one to use knowledge distillation, where a teacher network usually contains knowledge for a specific task and a student network is constructed in a simpler architecture inheriting the knowledge of the teacher network. This paper proposes a new approach that uses an attention mechanism to extract knowledge from a teacher network. The attention function plays the role of determining which channels of feature maps in the teacher network to be used for training the student network so that the student network can only learn useful features. This approach allows a new model to learn useful features considering the model complexity.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"485 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Channel-Wise Attention and Channel Combination for Knowledge Distillation\",\"authors\":\"C. Han, K. Lee\",\"doi\":\"10.1145/3400286.3418273\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge distillation is a strategy to build machine learning models efficiently by making use of knowledge embedded in a pretrained model. Teacher-student framework is a well-known one to use knowledge distillation, where a teacher network usually contains knowledge for a specific task and a student network is constructed in a simpler architecture inheriting the knowledge of the teacher network. This paper proposes a new approach that uses an attention mechanism to extract knowledge from a teacher network. The attention function plays the role of determining which channels of feature maps in the teacher network to be used for training the student network so that the student network can only learn useful features. This approach allows a new model to learn useful features considering the model complexity.\",\"PeriodicalId\":326100,\"journal\":{\"name\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"485 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400286.3418273\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400286.3418273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Channel-Wise Attention and Channel Combination for Knowledge Distillation
Knowledge distillation is a strategy to build machine learning models efficiently by making use of knowledge embedded in a pretrained model. Teacher-student framework is a well-known one to use knowledge distillation, where a teacher network usually contains knowledge for a specific task and a student network is constructed in a simpler architecture inheriting the knowledge of the teacher network. This paper proposes a new approach that uses an attention mechanism to extract knowledge from a teacher network. The attention function plays the role of determining which channels of feature maps in the teacher network to be used for training the student network so that the student network can only learn useful features. This approach allows a new model to learn useful features considering the model complexity.