基于多尺度剩余注意的Swin变压器遥感图像语义分割

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition Pub Date : 2022-11-17 DOI:10.1145/3581807.3581827

Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu

{"title":"基于多尺度剩余注意的Swin变压器遥感图像语义分割","authors":"Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu","doi":"10.1145/3581807.3581827","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Swin Transformer with Multi-Scale Residual Attention for Semantic Segmentation of Remote Sensing Images\",\"authors\":\"Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu\",\"doi\":\"10.1145/3581807.3581827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

遥感图像的语义分割通常面临前景背景不平衡、目标尺度变化大、类间相似性显著等问题。基于fcn的全卷积编码器-解码器架构似乎已经成为语义分割的标准，这种架构在遥感图像中也很普遍。然而，由于CNN的局限性，编码器无法获得全局上下文信息，这对于遥感图像的语义分割是非常重要的。相比之下，本文将基于cnn的编码器替换为Swin Transformer，以获得丰富的全局上下文信息。此外，对于基于cnn的解码器，我们提出了多层次连接模块(MLCM)来融合高、低层语义信息，帮助特征图获得更多的语义信息，并使用多尺度上采样模块(MSUM)加入上采样过程，更好地恢复图像的分辨率，从而获得更好的分割结果。在ISPRS Vaihingen和Potsdam数据集上的实验结果表明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Swin Transformer with Multi-Scale Residual Attention for Semantic Segmentation of Remote Sensing Images

Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Multi-Scale Channel Attention for Chinese Scene Text Recognition Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion Comparative Study on EEG Feature Recognition based on Deep Belief Network VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR