Kang Zheng;Yu Chen;Jingrong Wang;Zhifei Liu;Shuai Bao;Jiao Zhan;Nan Shen
{"title":"利用变压器和知识蒸馏提高遥感语义分割的精度和效率","authors":"Kang Zheng;Yu Chen;Jingrong Wang;Zhifei Liu;Shuai Bao;Jiao Zhan;Nan Shen","doi":"10.1109/JSTARS.2025.3525634","DOIUrl":null,"url":null,"abstract":"In semantic segmentation tasks, the transition from convolutional neural networks (CNNs) to transformers is driven by the latter's superior ability to capture global semantic information in remote sensing images. However, most transformer methods face challenges such as slow inference speed and limitations in capturing local features. To address these issues, this study designs a hybrid approach that integrates knowledge distillation with a combination of CNN and transformer to enhance semantic segmentation in remote sensing images. First, this article proposes the dual-path convolutional transformer network (DP-CTNet) with a dual-path structure to leverage the strengths of both CNN and transformers. It incorporates a feature refinement module to optimize the transformer's feature learning, and a feature fusion module to effectively merge CNN and transformer features, preventing the insufficient learning of local features by the transformer. Then, DP-CTNet serves as the teacher model, and pruning and knowledge distillation are employed to create efficient DP-CTNet (EDP-CTNet) with superior segmentation speed and accuracy. Angle knowledge distillation (AKD) is proposed to enhance the feature migration learning of DP-CTNet during knowledge distillation, leading to improved EDP-CTNet performance. Experimental results demonstrate that DP-CTNet thoroughly combines the respective advantages of CNN and Transformer, maintaining local detail features while learning extensive sequential semantic information. EDP-CTNet not only delivers impressive segmentation speed but also exhibits excellent segmentation accuracy following AKD training. In comparison to other models, the two models proposed in this article notably distinguish themselves in terms of accuracy and result visualization.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"4074-4092"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839278","citationCount":"0","resultStr":"{\"title\":\"Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation\",\"authors\":\"Kang Zheng;Yu Chen;Jingrong Wang;Zhifei Liu;Shuai Bao;Jiao Zhan;Nan Shen\",\"doi\":\"10.1109/JSTARS.2025.3525634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In semantic segmentation tasks, the transition from convolutional neural networks (CNNs) to transformers is driven by the latter's superior ability to capture global semantic information in remote sensing images. However, most transformer methods face challenges such as slow inference speed and limitations in capturing local features. To address these issues, this study designs a hybrid approach that integrates knowledge distillation with a combination of CNN and transformer to enhance semantic segmentation in remote sensing images. First, this article proposes the dual-path convolutional transformer network (DP-CTNet) with a dual-path structure to leverage the strengths of both CNN and transformers. It incorporates a feature refinement module to optimize the transformer's feature learning, and a feature fusion module to effectively merge CNN and transformer features, preventing the insufficient learning of local features by the transformer. Then, DP-CTNet serves as the teacher model, and pruning and knowledge distillation are employed to create efficient DP-CTNet (EDP-CTNet) with superior segmentation speed and accuracy. Angle knowledge distillation (AKD) is proposed to enhance the feature migration learning of DP-CTNet during knowledge distillation, leading to improved EDP-CTNet performance. Experimental results demonstrate that DP-CTNet thoroughly combines the respective advantages of CNN and Transformer, maintaining local detail features while learning extensive sequential semantic information. EDP-CTNet not only delivers impressive segmentation speed but also exhibits excellent segmentation accuracy following AKD training. In comparison to other models, the two models proposed in this article notably distinguish themselves in terms of accuracy and result visualization.\",\"PeriodicalId\":13116,\"journal\":{\"name\":\"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing\",\"volume\":\"18 \",\"pages\":\"4074-4092\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839278\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10839278/\",\"RegionNum\":2,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10839278/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation
In semantic segmentation tasks, the transition from convolutional neural networks (CNNs) to transformers is driven by the latter's superior ability to capture global semantic information in remote sensing images. However, most transformer methods face challenges such as slow inference speed and limitations in capturing local features. To address these issues, this study designs a hybrid approach that integrates knowledge distillation with a combination of CNN and transformer to enhance semantic segmentation in remote sensing images. First, this article proposes the dual-path convolutional transformer network (DP-CTNet) with a dual-path structure to leverage the strengths of both CNN and transformers. It incorporates a feature refinement module to optimize the transformer's feature learning, and a feature fusion module to effectively merge CNN and transformer features, preventing the insufficient learning of local features by the transformer. Then, DP-CTNet serves as the teacher model, and pruning and knowledge distillation are employed to create efficient DP-CTNet (EDP-CTNet) with superior segmentation speed and accuracy. Angle knowledge distillation (AKD) is proposed to enhance the feature migration learning of DP-CTNet during knowledge distillation, leading to improved EDP-CTNet performance. Experimental results demonstrate that DP-CTNet thoroughly combines the respective advantages of CNN and Transformer, maintaining local detail features while learning extensive sequential semantic information. EDP-CTNet not only delivers impressive segmentation speed but also exhibits excellent segmentation accuracy following AKD training. In comparison to other models, the two models proposed in this article notably distinguish themselves in terms of accuracy and result visualization.
期刊介绍:
The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.