Gudiseva Swetha, Rajeshreddy Datla, Chalavadi Vishnu, C. Krishna Mohan
Air quality monitoring plays a vital role in the sustainable development of any country. Continuous monitoring of the major air pollutants and forecasting their variations would be helpful in saving the environment and improving the quality of public health. However, this task becomes challenging with the available observations of air pollutants from the on-ground instruments with their limited spatial coverage. We propose a multimodal deep learning network (M2-APNet) to predict major air pollutants at a global scale from multimodal temporal satellite images. The inputs to the proposed M2-APNet include satellite image, digital elevation model (DEM), and other key attributes. The proposed M2-APNet employs a convolutional neural network to extract local features and a bidirectional long short-term memory to obtain longitudinal features from multimodal temporal data. These features are fused to uncover common patterns helpful for regression in predicting the major air pollutants and categorization of air quality index (AQI). We have conducted exhaustive experiments to predict air pollutants and AQI across important regions in India by employing multiple temporal modalities. Further, the experimental results demonstrate the effectiveness of DEM modality over others in learning to predict major air pollutants and determining the AQI.
{"title":"M2-APNet: A multimodal deep learning network to predict major air pollutants from temporal satellite images","authors":"Gudiseva Swetha, Rajeshreddy Datla, Chalavadi Vishnu, C. Krishna Mohan","doi":"10.1117/1.jrs.18.012005","DOIUrl":"https://doi.org/10.1117/1.jrs.18.012005","url":null,"abstract":"Air quality monitoring plays a vital role in the sustainable development of any country. Continuous monitoring of the major air pollutants and forecasting their variations would be helpful in saving the environment and improving the quality of public health. However, this task becomes challenging with the available observations of air pollutants from the on-ground instruments with their limited spatial coverage. We propose a multimodal deep learning network (M2-APNet) to predict major air pollutants at a global scale from multimodal temporal satellite images. The inputs to the proposed M2-APNet include satellite image, digital elevation model (DEM), and other key attributes. The proposed M2-APNet employs a convolutional neural network to extract local features and a bidirectional long short-term memory to obtain longitudinal features from multimodal temporal data. These features are fused to uncover common patterns helpful for regression in predicting the major air pollutants and categorization of air quality index (AQI). We have conducted exhaustive experiments to predict air pollutants and AQI across important regions in India by employing multiple temporal modalities. Further, the experimental results demonstrate the effectiveness of DEM modality over others in learning to predict major air pollutants and determining the AQI.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"41 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135271300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How to fully capture high-frequency information is an important issue in the remote sensing image super-resolution (SR) task. Most of the existing convolutional neural network based methods usually apply the attention mechanism to capture the high-frequency information. However, it is often insufficient since the remote sensing images usually contain more high-frequency information than natural images. Recently, some studies try to transform the original image into the wavelet domain to capture more high-frequency information. However, we observe that these methods usually apply similar network structures to learn different wavelet components, which will be difficult to fully capture the different features. To solve this issue, we propose a method named multi-frequency parallel wavelet transform network (MFPWTN) for remote sensing image SR. Specifically, we initially design two different network structures to reconstruct the high-frequency and low-frequency wavelet components, which can fully capture the characteristics of different frequencies. Subsequently, we introduce a high-frequency fusion module to enhance the information transmission among different high-frequency wavelet components. In addition, we employ the dilated convolution to establish the network structure for reconstructing the low-frequency wavelet component, which allows us to capture different receptive fields by using relatively few parameters. The experimental results on two public remote sensing datasets, UCMerced-LandUse and NWPU-RESISC45, show that the proposed MFPWTN can get superior performance over many existing state-of-the-art algorithms.
{"title":"MFPWTN: a multi-frequency parallel wavelet transform network for remote sensing image super-resolution","authors":"Cong Liu, Changlian Shi","doi":"10.1117/1.jrs.17.046503","DOIUrl":"https://doi.org/10.1117/1.jrs.17.046503","url":null,"abstract":"How to fully capture high-frequency information is an important issue in the remote sensing image super-resolution (SR) task. Most of the existing convolutional neural network based methods usually apply the attention mechanism to capture the high-frequency information. However, it is often insufficient since the remote sensing images usually contain more high-frequency information than natural images. Recently, some studies try to transform the original image into the wavelet domain to capture more high-frequency information. However, we observe that these methods usually apply similar network structures to learn different wavelet components, which will be difficult to fully capture the different features. To solve this issue, we propose a method named multi-frequency parallel wavelet transform network (MFPWTN) for remote sensing image SR. Specifically, we initially design two different network structures to reconstruct the high-frequency and low-frequency wavelet components, which can fully capture the characteristics of different frequencies. Subsequently, we introduce a high-frequency fusion module to enhance the information transmission among different high-frequency wavelet components. In addition, we employ the dilated convolution to establish the network structure for reconstructing the low-frequency wavelet component, which allows us to capture different receptive fields by using relatively few parameters. The experimental results on two public remote sensing datasets, UCMerced-LandUse and NWPU-RESISC45, show that the proposed MFPWTN can get superior performance over many existing state-of-the-art algorithms.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"72 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135220885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hans Courrier, Rand Swanson, Constantine Lukashin, Christine Buleri, John Carvo, Michael Cooney, Warren Davis, Alexander Halterman, Alan Hoskins, Trevor Jackson, Mike Kehoe, Greg Kopp, Thuan Nguyen, Noah Ryan, Carlos Roithmayr, Paul Smith, Mike Stebbins, Thomas Stone, Cindy Young
The ARCSTONE project objective is to acquire accurate measurements of the spectral lunar reflectance from space, allowing the Moon to be used as a high-accuracy SI-traceable calibration reference by spaceborne sensors in low-Earth and geostationary orbits. The required spectral range is 350 to 2300 nm with 4-nm sampling. The ARCSTONE approach is to measure solar and lunar spectral irradiances with a single set of optics and determine spectrally resolved lunar reflectances via a direct ratioing method, eliminating long-term optical degradation effects. Lunar-irradiance values, derived from these direct reflectance measurements, are enabled by independently measured SI-traceable spectral solar irradiances, essentially using the Sun as an on-orbit calibration reference. In an initial attempt to demonstrate this approach, a prototype ultraviolet-visible-near infrared (348 to 910 nm) instrument was designed, fully assembled, characterized, and field tested. Our results demonstrate that this prototype ARCSTONE instrument provides a dynamic range larger than 106, which is necessary to directly measure both the solar and lunar signals, and suggest uncertainties better than 0.5% (k = 1) in measuring lunar spectra can be achieved under proper operational scenarios. We present the design, characterization, and proof-of-concept field-test of the ARCSTONE instrument prototype.
{"title":"ARCSTONE: calibration of lunar spectral reflectance from space. Prototype instrument concept, analysis, and results","authors":"Hans Courrier, Rand Swanson, Constantine Lukashin, Christine Buleri, John Carvo, Michael Cooney, Warren Davis, Alexander Halterman, Alan Hoskins, Trevor Jackson, Mike Kehoe, Greg Kopp, Thuan Nguyen, Noah Ryan, Carlos Roithmayr, Paul Smith, Mike Stebbins, Thomas Stone, Cindy Young","doi":"10.1117/1.jrs.17.044508","DOIUrl":"https://doi.org/10.1117/1.jrs.17.044508","url":null,"abstract":"The ARCSTONE project objective is to acquire accurate measurements of the spectral lunar reflectance from space, allowing the Moon to be used as a high-accuracy SI-traceable calibration reference by spaceborne sensors in low-Earth and geostationary orbits. The required spectral range is 350 to 2300 nm with 4-nm sampling. The ARCSTONE approach is to measure solar and lunar spectral irradiances with a single set of optics and determine spectrally resolved lunar reflectances via a direct ratioing method, eliminating long-term optical degradation effects. Lunar-irradiance values, derived from these direct reflectance measurements, are enabled by independently measured SI-traceable spectral solar irradiances, essentially using the Sun as an on-orbit calibration reference. In an initial attempt to demonstrate this approach, a prototype ultraviolet-visible-near infrared (348 to 910 nm) instrument was designed, fully assembled, characterized, and field tested. Our results demonstrate that this prototype ARCSTONE instrument provides a dynamic range larger than 106, which is necessary to directly measure both the solar and lunar signals, and suggest uncertainties better than 0.5% (k = 1) in measuring lunar spectra can be achieved under proper operational scenarios. We present the design, characterization, and proof-of-concept field-test of the ARCSTONE instrument prototype.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"29 5-6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135272457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate lithological mapping is a difficult task through standard image processing techniques. We utilize the application of different machine learning (ML) algorithms on dual polarimetric synthetic aperture radar (SAR), optical data, and surface elevation images to map various lithologies in parts of Jaisalmer district of Rajasthan, India. Different SAR-derived textural and decomposition parameters were also used to improve the discrimination of various lithology units. Further, to improve the classification accuracy, different ML-based feature importance models, such as XGboost, decision tree, and random forest were implemented to select the useful bands for the classification of lithology. A total of 14 different ML classifiers were applied, and the best classifier was chosen after comparing their accuracies (overall accuracy, kappa coefficient, F1 score, and ROC-AUC curve) to map the lithology. Out of all of the classifiers used in this study, light gradient boosting machine (lightgbm) performed better in discriminating lithology (OA = 0.80, kappa coefficient = 0.75, and F1 score 0.79). In addition, the AUC values (>0.9 in all lithology units) were obtained for the “lightgbm” model, which is indicative of accurate discrimination of different lithological classes.
{"title":"Discernment of complex lithologies utilizing different scattering and textural components of SAR and optical data through machine learning approaches in Jaisalmer, Rajasthan, India","authors":"Raja Biswas, Virendra Singh Rathore","doi":"10.1117/1.jrs.17.044507","DOIUrl":"https://doi.org/10.1117/1.jrs.17.044507","url":null,"abstract":"Accurate lithological mapping is a difficult task through standard image processing techniques. We utilize the application of different machine learning (ML) algorithms on dual polarimetric synthetic aperture radar (SAR), optical data, and surface elevation images to map various lithologies in parts of Jaisalmer district of Rajasthan, India. Different SAR-derived textural and decomposition parameters were also used to improve the discrimination of various lithology units. Further, to improve the classification accuracy, different ML-based feature importance models, such as XGboost, decision tree, and random forest were implemented to select the useful bands for the classification of lithology. A total of 14 different ML classifiers were applied, and the best classifier was chosen after comparing their accuracies (overall accuracy, kappa coefficient, F1 score, and ROC-AUC curve) to map the lithology. Out of all of the classifiers used in this study, light gradient boosting machine (lightgbm) performed better in discriminating lithology (OA = 0.80, kappa coefficient = 0.75, and F1 score 0.79). In addition, the AUC values (>0.9 in all lithology units) were obtained for the “lightgbm” model, which is indicative of accurate discrimination of different lithological classes.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"37 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136262599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ele Vahtmäe, Kaire Toming, Laura Argus, Tiia Möller-Raid, Martin Ligi, Tiit Kutser
Modifications in submerged aquatic vegetation (SAV) spatial and temporal abundance patterns indicate changes in marine environmental conditions or physical disturbances and need to be monitored. Vegetation percent cover (%cover) is recognized as one of the key parameters in SAV monitoring. Coastal waters of the Baltic Sea are often turbid and contain high amount of colored dissolved organic matter. These factors significantly reduce the water depth, where benthic parameters can be detected by remote sensing. Field campaigns were carried out in a low-transparency Pärnu Bay area to assess to what extent multispectral Sentinel-2 (S2) satellite can be used for SAV %cover mapping in such waters. An average depth restriction for S2 benthic vegetation detection remained near 1.5 to 2.0 m. Empirical and physics-based methods were applied to S2 imagery to compare their performance for SAV %cover retrieval. Both methods identified similar %cover patterns. Model validation results showed that R2 of the best-performing models remained between 0.56 and 0.66 and root-mean-square error between 22.11 and 28.06. As physics-based inversion models do not require extensive set of training data for model calibration, those can be used for retrospective time series analysis across multitemporal images.
{"title":"On the possibility to map submerged aquatic vegetation cover with Sentinel-2 in low-transparency waters","authors":"Ele Vahtmäe, Kaire Toming, Laura Argus, Tiia Möller-Raid, Martin Ligi, Tiit Kutser","doi":"10.1117/1.jrs.17.044506","DOIUrl":"https://doi.org/10.1117/1.jrs.17.044506","url":null,"abstract":"Modifications in submerged aquatic vegetation (SAV) spatial and temporal abundance patterns indicate changes in marine environmental conditions or physical disturbances and need to be monitored. Vegetation percent cover (%cover) is recognized as one of the key parameters in SAV monitoring. Coastal waters of the Baltic Sea are often turbid and contain high amount of colored dissolved organic matter. These factors significantly reduce the water depth, where benthic parameters can be detected by remote sensing. Field campaigns were carried out in a low-transparency Pärnu Bay area to assess to what extent multispectral Sentinel-2 (S2) satellite can be used for SAV %cover mapping in such waters. An average depth restriction for S2 benthic vegetation detection remained near 1.5 to 2.0 m. Empirical and physics-based methods were applied to S2 imagery to compare their performance for SAV %cover retrieval. Both methods identified similar %cover patterns. Model validation results showed that R2 of the best-performing models remained between 0.56 and 0.66 and root-mean-square error between 22.11 and 28.06. As physics-based inversion models do not require extensive set of training data for model calibration, those can be used for retrospective time series analysis across multitemporal images.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"3 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135268195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed AboElenean, Ashraf Helmy, Fawzy ElTohamy, Ahmed Azouz
Land cover classification is a vital application of polarimetric synthetic aperture radar (PolSAR) images in various fields, such as agriculture monitoring and urban assessment. We introduce a modified and enhanced PolSAR image classification method, combining six decomposition techniques, a support vector machine (SVM) based classifier, and a probabilistic voting ensemble (PVE) model. Our method addresses the challenges posed by the complexity of PolSAR data and the limited availability of labeled samples. The core of our approach lies in integrating multiple decomposition techniques as feature extractors, aiming to capture diverse scattering behaviors and uncover valuable information related to land cover characteristics. These techniques include the Huynen, Cloude, Freeman and Durden, HAAlpha, Yamaguchi, and Vanzyl decomposition methods. The extracted features are then utilized as inputs for training the SVM base classifier. To enhance classification performance, a PVE model is used to combine predictions from each decomposition technique, considering the individual prediction confidence and the characteristics of the decomposition methods. The decision fusion process is applied to integrate diverse predictions based on the majority voting and estimated class probability, providing a more robust and reliable final label prediction and thereby improving the overall accuracy of the classification process. Experimental analyses are conducted on airborne and spaceborne PolSAR images, covering various bands and land cover types, to evaluate the effectiveness and robustness of our proposed method. The experimental results demonstrate that our approach yields more confident class predictions than alternative methods.
土地覆被分类是偏振合成孔径雷达(PolSAR)图像在农业监测和城市评价等领域的重要应用。本文介绍了一种改进和增强的PolSAR图像分类方法,该方法结合了六种分解技术、基于支持向量机(SVM)的分类器和概率投票集成(PVE)模型。我们的方法解决了PolSAR数据复杂性和标记样本有限可用性带来的挑战。该方法的核心在于整合多种分解技术作为特征提取器,旨在捕捉不同的散射行为,揭示与土地覆盖特征相关的有价值信息。这些技术包括Huynen, Cloude, Freeman and Durden, HAAlpha, Yamaguchi和Vanzyl分解方法。然后将提取的特征用作训练SVM基分类器的输入。为了提高分类性能,考虑到各个分解方法的个体预测置信度和分解方法的特点,采用PVE模型对不同分解方法的预测结果进行组合。采用决策融合过程对基于多数投票和估计类别概率的多种预测进行整合,提供更稳健、更可靠的最终标签预测,从而提高分类过程的整体准确性。实验分析了机载和星载PolSAR图像,涵盖了不同的波段和土地覆盖类型,以评估我们提出的方法的有效性和鲁棒性。实验结果表明,我们的方法比其他方法产生更有信心的类预测。
{"title":"Land cover analysis of PolSAR images using probabilistic voting ensemble and integrated support vector machine","authors":"Mohamed AboElenean, Ashraf Helmy, Fawzy ElTohamy, Ahmed Azouz","doi":"10.1117/1.jrs.17.044505","DOIUrl":"https://doi.org/10.1117/1.jrs.17.044505","url":null,"abstract":"Land cover classification is a vital application of polarimetric synthetic aperture radar (PolSAR) images in various fields, such as agriculture monitoring and urban assessment. We introduce a modified and enhanced PolSAR image classification method, combining six decomposition techniques, a support vector machine (SVM) based classifier, and a probabilistic voting ensemble (PVE) model. Our method addresses the challenges posed by the complexity of PolSAR data and the limited availability of labeled samples. The core of our approach lies in integrating multiple decomposition techniques as feature extractors, aiming to capture diverse scattering behaviors and uncover valuable information related to land cover characteristics. These techniques include the Huynen, Cloude, Freeman and Durden, HAAlpha, Yamaguchi, and Vanzyl decomposition methods. The extracted features are then utilized as inputs for training the SVM base classifier. To enhance classification performance, a PVE model is used to combine predictions from each decomposition technique, considering the individual prediction confidence and the characteristics of the decomposition methods. The decision fusion process is applied to integrate diverse predictions based on the majority voting and estimated class probability, providing a more robust and reliable final label prediction and thereby improving the overall accuracy of the classification process. Experimental analyses are conducted on airborne and spaceborne PolSAR images, covering various bands and land cover types, to evaluate the effectiveness and robustness of our proposed method. The experimental results demonstrate that our approach yields more confident class predictions than alternative methods.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135616978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantifying urban three-dimensional building form effects on land surface temperature: a case study of Beijing, China","authors":"Siyuan Li, Nannan Zhang","doi":"10.1117/1.jrs.17.048501","DOIUrl":"https://doi.org/10.1117/1.jrs.17.048501","url":null,"abstract":"","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135729642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semantic segmentation of high-resolution remote sensing images based on deep learning has become a hot research topic and has been widely applied. At present, based on the structure of the convolutional neural network, when extracting target features through multiple layer convolutional layers, it is easy to cause the loss of small target features and fuzzy boundary of ground object classification. Therefore, we propose a remote sensing image semantic segmentation method P-Net to detect small target and enhance edge feature. The proposed network was based on an Encoder-Decoder structure. The decoder included the following components: a progressive small target feature enhancement network (IFEN), a boundary thinning module (BRM), and a feature aggregation module (FIAM). Firstly, the dense side output features of the encoder network were utilized to learn and acquired small target feature information and target edge features. Secondly, the pyramid segmentation attention module was introduced to effectively extract fine-grained and multi-scale spatial information. This module enhanced the feature expression of small targets and obtained high-level semantic feature information. The boundary refinement module was designed to refine the low-level spatial feature information extracted by the encoder. Finally, in order to improve the accuracy of remote sensing image object segmentation boundaries, skip connections were used to fuse high-level semantic information and low-level spatial information acrossed layers. These skip connections had the same spatial resolution but different semantic information. In this paper, six evaluation indices including mean intersection over union, frequency weighted intersection over union, pixel accuracy, F1, recall, and precision were used to verify on two public datasets of high-resolution remote sensing images, Gaofen image dataset (GID) and wuhan dense labeling dataset (WHDLD). In the GID dataset, each index reached 78.90%, 78.87%, 87.76%, 87.74%, 87.51%, and 88.04%, respectively; in the WHDLD dataset, each index reached 63.21%, 75.20%, 84.67%, 75.79%, 76.56%, and 75.45%, respectively. The results show that the performance of proposed method is better than that of DeepLabv3+, U-NET, PSPNet, and DUC_HDC methods. More precisely, the recognition performance of small target features is better, and the boundary obtained between object categories is clearer.
{"title":"Remote sensing image semantic segmentation method based on small target and edge feature enhancement","authors":"Huaijun Wang, Luqi Qiao, He Li, Xiujuan Li, Junhuai Li, Ting Cao, Chunyi Zhang","doi":"10.1117/1.jrs.17.044503","DOIUrl":"https://doi.org/10.1117/1.jrs.17.044503","url":null,"abstract":"Semantic segmentation of high-resolution remote sensing images based on deep learning has become a hot research topic and has been widely applied. At present, based on the structure of the convolutional neural network, when extracting target features through multiple layer convolutional layers, it is easy to cause the loss of small target features and fuzzy boundary of ground object classification. Therefore, we propose a remote sensing image semantic segmentation method P-Net to detect small target and enhance edge feature. The proposed network was based on an Encoder-Decoder structure. The decoder included the following components: a progressive small target feature enhancement network (IFEN), a boundary thinning module (BRM), and a feature aggregation module (FIAM). Firstly, the dense side output features of the encoder network were utilized to learn and acquired small target feature information and target edge features. Secondly, the pyramid segmentation attention module was introduced to effectively extract fine-grained and multi-scale spatial information. This module enhanced the feature expression of small targets and obtained high-level semantic feature information. The boundary refinement module was designed to refine the low-level spatial feature information extracted by the encoder. Finally, in order to improve the accuracy of remote sensing image object segmentation boundaries, skip connections were used to fuse high-level semantic information and low-level spatial information acrossed layers. These skip connections had the same spatial resolution but different semantic information. In this paper, six evaluation indices including mean intersection over union, frequency weighted intersection over union, pixel accuracy, F1, recall, and precision were used to verify on two public datasets of high-resolution remote sensing images, Gaofen image dataset (GID) and wuhan dense labeling dataset (WHDLD). In the GID dataset, each index reached 78.90%, 78.87%, 87.76%, 87.74%, 87.51%, and 88.04%, respectively; in the WHDLD dataset, each index reached 63.21%, 75.20%, 84.67%, 75.79%, 76.56%, and 75.45%, respectively. The results show that the performance of proposed method is better than that of DeepLabv3+, U-NET, PSPNet, and DUC_HDC methods. More precisely, the recognition performance of small target features is better, and the boundary obtained between object categories is clearer.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135883049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}