基于多尺度剩余注意的Swin变压器遥感图像语义分割

Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu
{"title":"基于多尺度剩余注意的Swin变压器遥感图像语义分割","authors":"Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu","doi":"10.1145/3581807.3581827","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Swin Transformer with Multi-Scale Residual Attention for Semantic Segmentation of Remote Sensing Images\",\"authors\":\"Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu\",\"doi\":\"10.1145/3581807.3581827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

遥感图像的语义分割通常面临前景背景不平衡、目标尺度变化大、类间相似性显著等问题。基于fcn的全卷积编码器-解码器架构似乎已经成为语义分割的标准,这种架构在遥感图像中也很普遍。然而,由于CNN的局限性,编码器无法获得全局上下文信息,这对于遥感图像的语义分割是非常重要的。相比之下,本文将基于cnn的编码器替换为Swin Transformer,以获得丰富的全局上下文信息。此外,对于基于cnn的解码器,我们提出了多层次连接模块(MLCM)来融合高、低层语义信息,帮助特征图获得更多的语义信息,并使用多尺度上采样模块(MSUM)加入上采样过程,更好地恢复图像的分辨率,从而获得更好的分割结果。在ISPRS Vaihingen和Potsdam数据集上的实验结果表明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Swin Transformer with Multi-Scale Residual Attention for Semantic Segmentation of Remote Sensing Images
Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-Scale Channel Attention for Chinese Scene Text Recognition Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion Comparative Study on EEG Feature Recognition based on Deep Belief Network VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1