{"title":"Efficient knowledge distillation for hybrid models: A vision transformer-convolutional neural network to convolutional neural network approach for classifying remote sensing images","authors":"Huaxiang Song, Yuxuan Yuan, Zhiwei Ouyang, Yu Yang, Hui Xiang","doi":"10.1049/csy2.12120","DOIUrl":null,"url":null,"abstract":"<p>In various fields, knowledge distillation (KD) techniques that combine vision transformers (ViTs) and convolutional neural networks (CNNs) as a hybrid teacher have shown remarkable results in classification. However, in the realm of remote sensing images (RSIs), existing KD research studies are not only scarce but also lack competitiveness. This issue significantly impedes the deployment of the notable advantages of ViTs and CNNs. To tackle this, the authors introduce a novel hybrid-model KD approach named HMKD-Net, which comprises a CNN-ViT ensemble teacher and a CNN student. Contrary to popular opinion, the authors posit that the sparsity in RSI data distribution limits the effectiveness and efficiency of hybrid-model knowledge transfer. As a solution, a simple yet innovative method to handle variances during the KD phase is suggested, leading to substantial enhancements in the effectiveness and efficiency of hybrid knowledge transfer. The authors assessed the performance of HMKD-Net on three RSI datasets. The findings indicate that HMKD-Net significantly outperforms other cutting-edge methods while maintaining a significantly smaller size. Specifically, HMKD-Net exceeds other KD-based methods with a maximum accuracy improvement of 22.8% across various datasets. As ablation experiments indicated, HMKD-Net has cut down on time expenses by about 80% in the KD process. This research study validates that the hybrid-model KD technique can be more effective and efficient if the data distribution sparsity in RSIs is well handled.</p>","PeriodicalId":34110,"journal":{"name":"IET Cybersystems and Robotics","volume":"6 3","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/csy2.12120","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Cybersystems and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/csy2.12120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In various fields, knowledge distillation (KD) techniques that combine vision transformers (ViTs) and convolutional neural networks (CNNs) as a hybrid teacher have shown remarkable results in classification. However, in the realm of remote sensing images (RSIs), existing KD research studies are not only scarce but also lack competitiveness. This issue significantly impedes the deployment of the notable advantages of ViTs and CNNs. To tackle this, the authors introduce a novel hybrid-model KD approach named HMKD-Net, which comprises a CNN-ViT ensemble teacher and a CNN student. Contrary to popular opinion, the authors posit that the sparsity in RSI data distribution limits the effectiveness and efficiency of hybrid-model knowledge transfer. As a solution, a simple yet innovative method to handle variances during the KD phase is suggested, leading to substantial enhancements in the effectiveness and efficiency of hybrid knowledge transfer. The authors assessed the performance of HMKD-Net on three RSI datasets. The findings indicate that HMKD-Net significantly outperforms other cutting-edge methods while maintaining a significantly smaller size. Specifically, HMKD-Net exceeds other KD-based methods with a maximum accuracy improvement of 22.8% across various datasets. As ablation experiments indicated, HMKD-Net has cut down on time expenses by about 80% in the KD process. This research study validates that the hybrid-model KD technique can be more effective and efficient if the data distribution sparsity in RSIs is well handled.