用于无人机遥感图像语义分割的混合 CNN 和变压器网络

Xuanyu Zhou;Lifan Zhou;Shengrong Gong;Haizhen Zhang;Shan Zhong;Yu Xia;Yizhou Huang
{"title":"用于无人机遥感图像语义分割的混合 CNN 和变压器网络","authors":"Xuanyu Zhou;Lifan Zhou;Shengrong Gong;Haizhen Zhang;Shan Zhong;Yu Xia;Yizhou Huang","doi":"10.1109/JMASS.2023.3332948","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder–decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.","PeriodicalId":100624,"journal":{"name":"IEEE Journal on Miniaturization for Air and Space Systems","volume":"5 1","pages":"33-41"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images\",\"authors\":\"Xuanyu Zhou;Lifan Zhou;Shengrong Gong;Haizhen Zhang;Shan Zhong;Yu Xia;Yizhou Huang\",\"doi\":\"10.1109/JMASS.2023.3332948\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder–decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.\",\"PeriodicalId\":100624,\"journal\":{\"name\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"volume\":\"5 1\",\"pages\":\"33-41\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10319338/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Miniaturization for Air and Space Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10319338/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

无人飞行器(UAV)遥感图像的语义分割是近年来的研究热点,为各种类型的无人飞行器遥感任务提供了技术支持。然而,与一般场景图像不同,无人机遥感图像面临着固有的挑战。这些挑战包括背景的复杂性、目标尺度的巨大变化以及小目标的密集排列,这些都严重阻碍了语义分割的准确性。为了解决这些问题,我们提出了一种用于无人机遥感图像语义分割的卷积神经网络(CNN)和变压器混合网络。所提议的网络采用编码器-解码器架构,将基于变压器的编码器与基于 CNN 的解码器合并在一起。首先,我们将 Swin 变压器作为编码器,以解决 CNN 在全局建模方面的局限性,减轻复杂背景信息造成的干扰。其次,为了有效处理目标尺度的显著变化,我们设计了多尺度特征整合模块(MFIM),增强了网络的多尺度特征表示能力。最后,我们设计了语义特征融合模块(SFFM)来过滤特征融合过程中的冗余噪声,从而提高对小目标和边缘的识别能力。实验结果表明,在 UAVid 和 Aeroscapes 数据集上,所提出的方法优于其他流行方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images
Semantic segmentation of unmanned aerial vehicle (UAV) remote sensing images is a recent research hotspot, offering technical support for diverse types of UAV remote sensing missions. However, unlike general scene images, UAV remote sensing images present inherent challenges. These challenges include the complexity of backgrounds, substantial variations in target scales, and dense arrangements of small targets, which severely hinder the accuracy of semantic segmentation. To address these issues, we propose a convolutional neural network (CNN) and transformer hybrid network for semantic segmentation of UAV remote sensing images. The proposed network follows an encoder–decoder architecture that merges a transformer-based encoder with a CNN-based decoder. First, we incorporate the Swin transformer as the encoder to address the limitations of CNN in global modeling, mitigating the interference caused by complex background information. Second, to effectively handle the significant changes in target scales, we design the multiscale feature integration module (MFIM) that enhances the multiscale feature representation capability of the network. Finally, the semantic feature fusion module (SFFM) is designed to filter the redundant noise during the feature fusion process, which improves the recognition of small targets and edges. Experimental results demonstrate that the proposed method outperforms other popular methods on the UAVid and Aeroscapes datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.40
自引率
0.00%
发文量
0
期刊最新文献
2024 Index IEEE Journal on Miniaturization for Air and Space Systems Vol. 5 Table of Contents Front Cover The Journal of Miniaturized Air and Space Systems Broadband Miniaturized Antenna Based on Enhanced Magnetic Field Convergence in UAV
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1