用于无人机遥感图像语义分割的混合 CNN 和变压器网络

IEEE Journal on Miniaturization for Air and Space Systems Pub Date : 2023-11-15 DOI:10.1109/JMASS.2023.3332948

Xuanyu Zhou;Lifan Zhou;Shengrong Gong;Haizhen Zhang;Shan Zhong;Yu Xia;Yizhou Huang

{"title":"用于无人机遥感图像语义分割的混合 CNN 和变压器网络","authors":"Xuanyu Zhou;Lifan Zhou;Shengrong Gong;Haizhen Zhang;Shan Zhong;Yu Xia;Yizhou Huang","doi":"10.1109/JMASS.2023.3332948","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder–decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.","PeriodicalId":100624,"journal":{"name":"IEEE Journal on Miniaturization for Air and Space Systems","volume":"5 1","pages":"33-41"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images\",\"authors\":\"Xuanyu Zhou;Lifan Zhou;Shengrong Gong;Haizhen Zhang;Shan Zhong;Yu Xia;Yizhou Huang\",\"doi\":\"10.1109/JMASS.2023.3332948\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder–decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.\",\"PeriodicalId\":100624,\"journal\":{\"name\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"volume\":\"5 1\",\"pages\":\"33-41\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10319338/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Miniaturization for Air and Space Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10319338/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

无人飞行器（UAV）遥感图像的语义分割是近年来的研究热点，为各种类型的无人飞行器遥感任务提供了技术支持。然而，与一般场景图像不同，无人机遥感图像面临着固有的挑战。这些挑战包括背景的复杂性、目标尺度的巨大变化以及小目标的密集排列，这些都严重阻碍了语义分割的准确性。为了解决这些问题，我们提出了一种用于无人机遥感图像语义分割的卷积神经网络（CNN）和变压器混合网络。所提议的网络采用编码器-解码器架构，将基于变压器的编码器与基于 CNN 的解码器合并在一起。首先，我们将 Swin 变压器作为编码器，以解决 CNN 在全局建模方面的局限性，减轻复杂背景信息造成的干扰。其次，为了有效处理目标尺度的显著变化，我们设计了多尺度特征整合模块（MFIM），增强了网络的多尺度特征表示能力。最后，我们设计了语义特征融合模块（SFFM）来过滤特征融合过程中的冗余噪声，从而提高对小目标和边缘的识别能力。实验结果表明，在 UAVid 和 Aeroscapes 数据集上，所提出的方法优于其他流行方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images

Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder–decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助