{"title":"STRD-Net: A Dual-Encoder Semantic Segmentation Network for Urban Green Space Extraction","authors":"Mouzhe Yu;Liheng He;Zhehui Shen;Meng Lv","doi":"10.1109/TGRS.2024.3456898","DOIUrl":null,"url":null,"abstract":"Urban green spaces significantly influence the production and lifestyle of individuals. Deep learning methods using convolutional neural network (CNN) as the encoder have weak global feature extraction capabilities, often missing individual trees or small areas of low vegetation. Transformer series models have weak local feature extraction capabilities and perform poorly in distinguishing between small categories such as trees and low vegetation. Therefore, we propose a novel dual-encoder semantic segmentation model, swin transformer and resnet50 dual-encoder net (STRD-Net), which integrates a parallel swin transformer (ST) framework and a CNN framework, capable of accepting two different channel ratio images as input, enabling the model to capture both global and local features. In the ST encoder, a convolutional block attention module (CBAM) is added to the head to overcome the “salt-and-pepper” noise effect in extraction results. A new patch merging (NPM) module is added after each ST module to further enhance the local feature extraction capabilities of the ST encoder for urban green spaces. In the CNN encoder, an enhanced atrous spatial pyramid pooling (EASPP) module is added after the Resnet50 backbone extraction network to expand the receptive field of the CNN encoder and enhance the global feature extraction capabilities for urban green spaces. The model includes a single skip connection to ensure extraction accuracy while saving computational resources. Results on the Vaihingen and Potsdam datasets indicate that STRD-Net improves both local and global feature extraction capabilities in the extraction of urban green spaces. The code will be available at \n<uri>https://github.com/learn-zhezhe/STRD-Net</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-13"},"PeriodicalIF":8.6000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10671599/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Urban green spaces significantly influence the production and lifestyle of individuals. Deep learning methods using convolutional neural network (CNN) as the encoder have weak global feature extraction capabilities, often missing individual trees or small areas of low vegetation. Transformer series models have weak local feature extraction capabilities and perform poorly in distinguishing between small categories such as trees and low vegetation. Therefore, we propose a novel dual-encoder semantic segmentation model, swin transformer and resnet50 dual-encoder net (STRD-Net), which integrates a parallel swin transformer (ST) framework and a CNN framework, capable of accepting two different channel ratio images as input, enabling the model to capture both global and local features. In the ST encoder, a convolutional block attention module (CBAM) is added to the head to overcome the “salt-and-pepper” noise effect in extraction results. A new patch merging (NPM) module is added after each ST module to further enhance the local feature extraction capabilities of the ST encoder for urban green spaces. In the CNN encoder, an enhanced atrous spatial pyramid pooling (EASPP) module is added after the Resnet50 backbone extraction network to expand the receptive field of the CNN encoder and enhance the global feature extraction capabilities for urban green spaces. The model includes a single skip connection to ensure extraction accuracy while saving computational resources. Results on the Vaihingen and Potsdam datasets indicate that STRD-Net improves both local and global feature extraction capabilities in the extraction of urban green spaces. The code will be available at
https://github.com/learn-zhezhe/STRD-Net
.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.