{"title":"Improve Fine-grained Visual Classification Accuracy by Controllable Location Knowledge Distillation","authors":"You-Lin Tsai, Cheng-Hung Lin, Po-Yung Chou","doi":"10.1109/ICCE59016.2024.10444242","DOIUrl":null,"url":null,"abstract":"The current state-of-the-art network models have achieved remarkable performance. However, they often face an issue of having excessively large architectures, making them challenging to deploy on edge devices. In response to this challenge, a groundbreaking solution known as knowledge distillation has been introduced. The concept of knowledge distillation involves transferring information from a complex teacher model to a simpler student model, effectively reducing the model’s complexity. Prior approach has demonstrated promising transfer effects through this technique. Nevertheless, in the field of fine-grained image classification, there has been limited exploration of distillation methods custom-tailored specifically for this domain. In this paper, we focus on knowledge distillation specifically designed for fine-grained image recognition. Notably, this strategy is inspired by the Class Activation Maps (CAM). We first train a complex model and use it generated feature maps with spatial information, which is call hint maps. Furthermore, we propose an adjustment strategy for this hint map, which can control local distribution of information. Referring to it as Controllable Size CAM (CTRLS-CAM). Use it as a guide allowing the student model to effectively learn from the teacher model’s behavior and concentrate on discriminate details. In contrast to conventional distillation models, this strategy proves particularly advantageous for fine-grained recognition, enhancing learning outcomes and enabling the student model to achieve superior performance. In CTRLS-CAM method, we refine the hint maps value distribution, redefining the relative relationships between primary and secondary feature areas. We conducted experiments using the CUB200-2011 dataset, and our results demonstrated a significant accuracy improvement of about 5% compared to the original non-distilled student model. Moreover, our approach achieved a 1.23% enhancement over traditional knowledge distillation methods.","PeriodicalId":518694,"journal":{"name":"2024 IEEE International Conference on Consumer Electronics (ICCE)","volume":"96 11","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE59016.2024.10444242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The current state-of-the-art network models have achieved remarkable performance. However, they often face an issue of having excessively large architectures, making them challenging to deploy on edge devices. In response to this challenge, a groundbreaking solution known as knowledge distillation has been introduced. The concept of knowledge distillation involves transferring information from a complex teacher model to a simpler student model, effectively reducing the model’s complexity. Prior approach has demonstrated promising transfer effects through this technique. Nevertheless, in the field of fine-grained image classification, there has been limited exploration of distillation methods custom-tailored specifically for this domain. In this paper, we focus on knowledge distillation specifically designed for fine-grained image recognition. Notably, this strategy is inspired by the Class Activation Maps (CAM). We first train a complex model and use it generated feature maps with spatial information, which is call hint maps. Furthermore, we propose an adjustment strategy for this hint map, which can control local distribution of information. Referring to it as Controllable Size CAM (CTRLS-CAM). Use it as a guide allowing the student model to effectively learn from the teacher model’s behavior and concentrate on discriminate details. In contrast to conventional distillation models, this strategy proves particularly advantageous for fine-grained recognition, enhancing learning outcomes and enabling the student model to achieve superior performance. In CTRLS-CAM method, we refine the hint maps value distribution, redefining the relative relationships between primary and secondary feature areas. We conducted experiments using the CUB200-2011 dataset, and our results demonstrated a significant accuracy improvement of about 5% compared to the original non-distilled student model. Moreover, our approach achieved a 1.23% enhancement over traditional knowledge distillation methods.