Land cover change detection (LCCD) is a crucial research topic for rational planning of land use and facilitation of sustainable land resource growth. However, due to the complexity of LCCD tasks, integrating global and local features and fusing contextual information from remote sensing features are essential. Recently, with the advent of Mamba, which maintains linear time complexity and high efficiency in processing long-range data, it offers a new solution to address feature-fusion challenges in LCCD. Therefore, a novel visual state space model (SSM) for Land Cover Change Detection (LCCDMamba) is proposed, which uses Siam-VMamba as a backbone to extract multidimensional land cover features. To fuse the change information across difference temporal, multiscale information spatio-temporal fusion (MISF) module is designed to aggregate difference information from bitemporal features. The proposed MISF comprises multi-scale feature aggregation (MSFA), which utilizes strip convolution to aggregate multiscale local change information of bitemporal land cover features, and residual with SS2D (RSS) which employs residual structure with SS2D to capture global feature differences of bitemporal land cover features. To enhance the correlation of change features across different dimensions, in the decoder, we design a dual token modeling SSM (DTMS) through two token modeling approaches. This preserves high-dimensional semantic features and thus ensures that the multiscale change information across various dimensions will not be lost during feature restoration. Experiments have been conducted on WHU-CD, LEVIR-CD, and GVLM datasets, and the results demonstrate that LCCDMamba achieves F1 scores of 94.18%, 91.68%, and 87.14%, respectively, outperforming all the models compared.
{"title":"LCCDMamba: Visual State Space Model for Land Cover Change Detection of VHR Remote Sensing Images","authors":"Junqing Huang;Xiaochen Yuan;Chan-Tong Lam;Yapeng Wang;Min Xia","doi":"10.1109/JSTARS.2025.3531499","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531499","url":null,"abstract":"Land cover change detection (LCCD) is a crucial research topic for rational planning of land use and facilitation of sustainable land resource growth. However, due to the complexity of LCCD tasks, integrating global and local features and fusing contextual information from remote sensing features are essential. Recently, with the advent of Mamba, which maintains linear time complexity and high efficiency in processing long-range data, it offers a new solution to address feature-fusion challenges in LCCD. Therefore, a novel visual state space model (SSM) for Land Cover Change Detection (LCCDMamba) is proposed, which uses Siam-VMamba as a backbone to extract multidimensional land cover features. To fuse the change information across difference temporal, multiscale information spatio-temporal fusion (MISF) module is designed to aggregate difference information from bitemporal features. The proposed MISF comprises multi-scale feature aggregation (MSFA), which utilizes strip convolution to aggregate multiscale local change information of bitemporal land cover features, and residual with SS2D (RSS) which employs residual structure with SS2D to capture global feature differences of bitemporal land cover features. To enhance the correlation of change features across different dimensions, in the decoder, we design a dual token modeling SSM (DTMS) through two token modeling approaches. This preserves high-dimensional semantic features and thus ensures that the multiscale change information across various dimensions will not be lost during feature restoration. Experiments have been conducted on WHU-CD, LEVIR-CD, and GVLM datasets, and the results demonstrate that LCCDMamba achieves F1 scores of 94.18%, 91.68%, and 87.14%, respectively, outperforming all the models compared.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"5765-5781"},"PeriodicalIF":4.7,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845192","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-resolution remote sensing imagery (RSI) plays a pivotal role in the semantic segmentation (SS) of urban scenes, particularly in urban management tasks such as building planning and traffic flow analysis. However, the dense distribution of objects and the prevalent background noise in RSI make it challenging to achieve stable and accurate results from a single view. Integrating digital surface models (DSM) can achieve high-precision SS. But this often requires extensive computational resources. It is essential to address the tradeoff between accuracy and computational cost and optimize the method for deployment on edge devices. In this article, we introduce an efficient multimodal symmetric network (EMSNet) designed to perform SS by leveraging both optical and DSM images. Unlike other multimodal methods, EMSNet adopts a dual encoder–decoder structure to build a direct connection between DSM data and the final result, making full use of the advanced DSM. Between branches, we propose a continuous feature interaction to guide the DSM branch by RGB features. Within each branch, multilevel feature fusion captures low spatial and high semantic information, improving the model's scene perception. Meanwhile, knowledge distillation (KD) further improves the performance and generalization of EMSNet. Experiments on the Potsdam and Vaihingen datasets demonstrate the superiority of our method over other baseline models. Ablation experiments validate the effectiveness of each component. Besides, the KD strategy is confirmed by comparing it with the segment anything model (SAM). It enables the proposed multimodal SS network to match SAM's performance with only one-fifth of the parameters, computation, and latency.
{"title":"EMSNet: Efficient Multimodal Symmetric Network for Semantic Segmentation of Urban Scene From Remote Sensing Imagery","authors":"Yejian Zhou;Yachen Wang;Jie Su;Zhenyu Wen;Puzhao Zhang;Wenan Zhang","doi":"10.1109/JSTARS.2025.3531422","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531422","url":null,"abstract":"High-resolution remote sensing imagery (RSI) plays a pivotal role in the semantic segmentation (SS) of urban scenes, particularly in urban management tasks such as building planning and traffic flow analysis. However, the dense distribution of objects and the prevalent background noise in RSI make it challenging to achieve stable and accurate results from a single view. Integrating digital surface models (DSM) can achieve high-precision SS. But this often requires extensive computational resources. It is essential to address the tradeoff between accuracy and computational cost and optimize the method for deployment on edge devices. In this article, we introduce an efficient multimodal symmetric network (EMSNet) designed to perform SS by leveraging both optical and DSM images. Unlike other multimodal methods, EMSNet adopts a dual encoder–decoder structure to build a direct connection between DSM data and the final result, making full use of the advanced DSM. Between branches, we propose a continuous feature interaction to guide the DSM branch by RGB features. Within each branch, multilevel feature fusion captures low spatial and high semantic information, improving the model's scene perception. Meanwhile, knowledge distillation (KD) further improves the performance and generalization of EMSNet. Experiments on the Potsdam and Vaihingen datasets demonstrate the superiority of our method over other baseline models. Ablation experiments validate the effectiveness of each component. Besides, the KD strategy is confirmed by comparing it with the segment anything model (SAM). It enables the proposed multimodal SS network to match SAM's performance with only one-fifth of the parameters, computation, and latency.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"5878-5892"},"PeriodicalIF":4.7,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845133","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1109/JSTARS.2025.3531984
Horatiu Florea;Sergiu Nedevschi
Aerial scene understanding systems face stringent payload restrictions and must often rely on monocular depth estimation for modeling scene geometry, which is an inherently ill-posed problem. Moreover, obtaining accurate ground truth data required by learning-based methods raises significant additional challenges in the aerial domain. Self-supervised approaches can bypass this problem, at the cost of providing only up-to-scale results. Similarly, recent supervised solutions which make good progress toward zero-shot generalization also provide only relative depth values. This work presents TanDepth, a practical scale recovery method for obtaining metric depth results from relative estimations at inference-time, irrespective of the type of model generating them. Tailored for uncrewed aerial vehicle (UAV) applications, our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view using extrinsic and intrinsic information. An adaptation to the cloth simulation filter is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points. We evaluate and compare our method against alternate scaling methods adapted for UAVs, on a variety of real-world scenes. Considering the limited availability of data for this domain, we construct and release a comprehensive, depth-focused extension to the popular UAVid dataset to further research.
{"title":"TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs","authors":"Horatiu Florea;Sergiu Nedevschi","doi":"10.1109/JSTARS.2025.3531984","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531984","url":null,"abstract":"Aerial scene understanding systems face stringent payload restrictions and must often rely on monocular depth estimation for modeling scene geometry, which is an inherently ill-posed problem. Moreover, obtaining accurate ground truth data required by learning-based methods raises significant additional challenges in the aerial domain. Self-supervised approaches can bypass this problem, at the cost of providing only up-to-scale results. Similarly, recent supervised solutions which make good progress toward zero-shot generalization also provide only relative depth values. This work presents TanDepth, a practical scale recovery method for obtaining metric depth results from relative estimations at inference-time, irrespective of the type of model generating them. Tailored for uncrewed aerial vehicle (UAV) applications, our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view using extrinsic and intrinsic information. An adaptation to the cloth simulation filter is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points. We evaluate and compare our method against alternate scaling methods adapted for UAVs, on a variety of real-world scenes. Considering the limited availability of data for this domain, we construct and release a comprehensive, depth-focused extension to the popular UAVid dataset to further research.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"5445-5459"},"PeriodicalIF":4.7,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10848130","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/JSTARS.2025.3531439
Shuang Wu;Lei Deng;Qinghua Qiao
Accurate long-term estimation of fractional vegetation cover (FVC) is crucial for monitoring vegetation dynamics. Satellite-based methods, such as the dimidiate pixel method (DPM), struggle with spatial heterogeneity due to coarse resolution. Existing methods using unmanned aerial vehicles (UAVs) combined with satellite data (UCS) inadequately leverage the high spatial resolution of UAV imagery to address spatial heterogeneity and are seldom applied to long-term FVC monitoring. To overcome spatial challenges, an improved dimidiate pixel method (IDPM) is proposed here, utilizing 2021 Landsat imagery to generate FVCDPM via DPM and upscaled UAV imagery for FVCUAV as ground references. The IDPM uses the pruned exact linear time method to segment the normalized difference vegetation index (NDVI) into intervals, within which DPM performance is evaluated for potential improvements. Specifically, if the difference (D) between FVCDPM and FVCUAV is nonzero, NDVI-derived texture features are incorporated into FVCDPM through multiple linear regression to enhance accuracy. To address temporal challenges and ensure consistency across years, the 2021 NDVI serves as a reference for inter-year NDVI calibration, employing least squares regression (LSR) and histogram matching (HM) to identify the most effective method for extending the IDPM to other years. Results demonstrate that 1) the IDPM, by developing distinct DPM improvement models for different NDVI intervals, considerably improves UAV and satellite data integration, with a 48.51% increase in R2 and a 56.47% reduction in root mean square error (RMSE) compared to the DPM and UCS and 2) HM is found to be more suitable for mining areas, increasing R2 by 25.00% and reducing RMSE by 54.05% compared to LSR. This method provides an efficient, rapid solution for mitigating spatial heterogeneity and advancing long-term FVC estimation.
{"title":"Estimating Long-Term Fractional Vegetation Cover Using an Improved Dimidiate Pixel Method With UAV-Assisted Satellite Data: A Case Study in a Mining Region","authors":"Shuang Wu;Lei Deng;Qinghua Qiao","doi":"10.1109/JSTARS.2025.3531439","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531439","url":null,"abstract":"Accurate long-term estimation of fractional vegetation cover (FVC) is crucial for monitoring vegetation dynamics. Satellite-based methods, such as the dimidiate pixel method (DPM), struggle with spatial heterogeneity due to coarse resolution. Existing methods using unmanned aerial vehicles (UAVs) combined with satellite data (UCS) inadequately leverage the high spatial resolution of UAV imagery to address spatial heterogeneity and are seldom applied to long-term FVC monitoring. To overcome spatial challenges, an improved dimidiate pixel method (IDPM) is proposed here, utilizing 2021 Landsat imagery to generate FVC<sub>DPM</sub> via DPM and upscaled UAV imagery for FVC<sub>UAV</sub> as ground references. The IDPM uses the pruned exact linear time method to segment the normalized difference vegetation index (NDVI) into intervals, within which DPM performance is evaluated for potential improvements. Specifically, if the difference (D) between FVC<sub>DPM</sub> and FVC<sub>UAV</sub> is nonzero, NDVI-derived texture features are incorporated into FVC<sub>DPM</sub> through multiple linear regression to enhance accuracy. To address temporal challenges and ensure consistency across years, the 2021 NDVI serves as a reference for inter-year NDVI calibration, employing least squares regression (LSR) and histogram matching (HM) to identify the most effective method for extending the IDPM to other years. Results demonstrate that 1) the IDPM, by developing distinct DPM improvement models for different NDVI intervals, considerably improves UAV and satellite data integration, with a 48.51% increase in <italic>R</i><sup>2</sup> and a 56.47% reduction in root mean square error (RMSE) compared to the DPM and UCS and 2) HM is found to be more suitable for mining areas, increasing <italic>R</i><sup>2</sup> by 25.00% and reducing RMSE by 54.05% compared to LSR. This method provides an efficient, rapid solution for mitigating spatial heterogeneity and advancing long-term FVC estimation.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"4162-4173"},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845181","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/JSTARS.2025.3531353
Qun Song;Hangyuan Lu;Chang Xu;Rixian Liu;Weiguo Wan;Wei Tu
Pansharpening is the process of fusing a multispectral (MS) image with a panchromatic image to produce a high-resolution MS (HRMS) image. However, existing techniques face challenges in integrating long-range dependencies to correct locally misaligned features, which results in spatial-spectral distortions. Moreover, these methods tend to be computationally expensive. To address these challenges, we propose a novel detail injection algorithm and develop the invertible attention-guided adaptive convolution and dual-domain Transformer (IACDT) network. In IACDT, we designed an invertible attention mechanism embedded with spectral-spatial attention to efficiently and losslessly extract locally spatial-spectral-aware detail information. In addition, we presented a frequency-spatial dual-domain attention mechanism that combines a frequency-enhanced Transformer and a spatial window Transformer for long-range contextual detail feature correction. This architecture effectively integrates local detail features with long-range dependencies, enabling the model to correct both local misalignments and global inconsistencies. The final HRMS image is obtained through a reconstruction block that consists of residual multireceptive field attention. Extensive experiments demonstrate that IACDT achieves superior fusion performance, computational efficiency, and outstanding results in downstream tasks compared to state-of-the-art methods.
{"title":"Invertible Attention-Guided Adaptive Convolution and Dual-Domain Transformer for Pansharpening","authors":"Qun Song;Hangyuan Lu;Chang Xu;Rixian Liu;Weiguo Wan;Wei Tu","doi":"10.1109/JSTARS.2025.3531353","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531353","url":null,"abstract":"Pansharpening is the process of fusing a multispectral (MS) image with a panchromatic image to produce a high-resolution MS (HRMS) image. However, existing techniques face challenges in integrating long-range dependencies to correct locally misaligned features, which results in spatial-spectral distortions. Moreover, these methods tend to be computationally expensive. To address these challenges, we propose a novel detail injection algorithm and develop the invertible attention-guided adaptive convolution and dual-domain Transformer (IACDT) network. In IACDT, we designed an invertible attention mechanism embedded with spectral-spatial attention to efficiently and losslessly extract locally spatial-spectral-aware detail information. In addition, we presented a frequency-spatial dual-domain attention mechanism that combines a frequency-enhanced Transformer and a spatial window Transformer for long-range contextual detail feature correction. This architecture effectively integrates local detail features with long-range dependencies, enabling the model to correct both local misalignments and global inconsistencies. The final HRMS image is obtained through a reconstruction block that consists of residual multireceptive field attention. Extensive experiments demonstrate that IACDT achieves superior fusion performance, computational efficiency, and outstanding results in downstream tasks compared to state-of-the-art methods.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"5217-5231"},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845120","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/JSTARS.2025.3530926
Tianxiang Wang;Zhangfan Zeng;ShiHe Zhou;Qiao Xu
Automatic target recognition based on synthetic aperture radar (SAR) has extensive applications in dynamic surveillance, modern airport management, and military decision-making. However, the natural mechanisms of SAR imaging introduce challenges such as target feature discretization, clutter interference, and significant scale variation, which hinder the performance of existing recognition networks in practical scenarios. As such, this article presents a novel network architecture: the multiscale discrete feature enhancement network with augmented reversible transformation. The proposed network consists of three core components: an augmented feature extraction (AFE) backbone, a discrete feature enhancement module (DFEM), and a Spider feature pyramid network (Spider FPN). The AFE backbone has the capability of effective target information preservation and clutter suppression with the aid of integration of augmented reversible transformations with intermediate supervision module and double subnetworks. The DFEM enhances both local and global discrete feature awareness through its two submodules: local discrete feature enhancement module and global semantic information awareness module. The Spider FPN overcomes target scale variation challenges, especially for small-scale targets, through a fusion-diffusion mechanism and the designed feature perception fusion module. The functionality of the proposed method is evaluated on three public datasets: SARDet-100 K, MSAR-1.0, and SAR-AIRcraft-1.0 of various polarizations and environmental conditions. Experimental results demonstrate that the proposed network outperforms current state-of-the-art methods in terms of average precision by the levels of 63.3%, 72.3%, and 67.4%, respectively.
{"title":"A Multiscale Discrete Feature Enhancement Network With Augmented Reversible Transformation for SAR Automatic Target Recognition","authors":"Tianxiang Wang;Zhangfan Zeng;ShiHe Zhou;Qiao Xu","doi":"10.1109/JSTARS.2025.3530926","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3530926","url":null,"abstract":"Automatic target recognition based on synthetic aperture radar (SAR) has extensive applications in dynamic surveillance, modern airport management, and military decision-making. However, the natural mechanisms of SAR imaging introduce challenges such as target feature discretization, clutter interference, and significant scale variation, which hinder the performance of existing recognition networks in practical scenarios. As such, this article presents a novel network architecture: the multiscale discrete feature enhancement network with augmented reversible transformation. The proposed network consists of three core components: an augmented feature extraction (AFE) backbone, a discrete feature enhancement module (DFEM), and a Spider feature pyramid network (Spider FPN). The AFE backbone has the capability of effective target information preservation and clutter suppression with the aid of integration of augmented reversible transformations with intermediate supervision module and double subnetworks. The DFEM enhances both local and global discrete feature awareness through its two submodules: local discrete feature enhancement module and global semantic information awareness module. The Spider FPN overcomes target scale variation challenges, especially for small-scale targets, through a fusion-diffusion mechanism and the designed feature perception fusion module. The functionality of the proposed method is evaluated on three public datasets: SARDet-100 K, MSAR-1.0, and SAR-AIRcraft-1.0 of various polarizations and environmental conditions. Experimental results demonstrate that the proposed network outperforms current state-of-the-art methods in terms of average precision by the levels of 63.3%, 72.3%, and 67.4%, respectively.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"5135-5156"},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/JSTARS.2025.3531448
Chang-Jiang Zhang;Mei-Shu Chen;Lei-Ming Ma;Xiao-Qin Lu
Tropical cyclone (TC) is a highly catastrophic weather event, and accurate estimation of intensity is of great significance. The current proposed TC intensity estimation model focuses on training using satellite images from single or two channels, and the model cannot fully capture features related to TC intensity, resulting in low accuracy. To this end, we propose a double-layer encoder–decoder model for estimating the intensity of TC, which is trained using images from three channels: infrared, water vapor, and passive microwave. The model mainly consists of three modules: wavelet transform enhancement module, multichannel satellite image fusion module, and TC intensity estimation module, which are used to extract high-frequency information from the source image, generate a three-channel fused image, and perform TC intensity estimation. To validate the performance of our model, we conducted extensive experiments on the TCIR dataset. The experimental results show that the proposed model has MAE and RMSE of 3.76 m/s and 4.62 m/s for TC intensity estimation, which are 15.70% and 20.07% lower than advanced Dvorak technology, respectively. Therefore, the model proposed in this article has great potential in accurately estimating TC intensity.
{"title":"Deep Learning and Wavelet Transform Combined With Multichannel Satellite Images for Tropical Cyclone Intensity Estimation","authors":"Chang-Jiang Zhang;Mei-Shu Chen;Lei-Ming Ma;Xiao-Qin Lu","doi":"10.1109/JSTARS.2025.3531448","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531448","url":null,"abstract":"Tropical cyclone (TC) is a highly catastrophic weather event, and accurate estimation of intensity is of great significance. The current proposed TC intensity estimation model focuses on training using satellite images from single or two channels, and the model cannot fully capture features related to TC intensity, resulting in low accuracy. To this end, we propose a double-layer encoder–decoder model for estimating the intensity of TC, which is trained using images from three channels: infrared, water vapor, and passive microwave. The model mainly consists of three modules: wavelet transform enhancement module, multichannel satellite image fusion module, and TC intensity estimation module, which are used to extract high-frequency information from the source image, generate a three-channel fused image, and perform TC intensity estimation. To validate the performance of our model, we conducted extensive experiments on the TCIR dataset. The experimental results show that the proposed model has MAE and RMSE of 3.76 m/s and 4.62 m/s for TC intensity estimation, which are 15.70% and 20.07% lower than advanced Dvorak technology, respectively. Therefore, the model proposed in this article has great potential in accurately estimating TC intensity.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"4711-4735"},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845190","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/JSTARS.2025.3531107
Linye Zhu;Wenbin Sun;Deqin Fan;Huaqiao Xing;Haibo Ban
Greenhouse is a unique form of land use that plays a crucial role in agricultural production. Accurately mapping the spatial distribution of greenhouses and accurately evaluating the level of regional agricultural development are of great significance in promoting the sustainable development of agriculture and guaranteeing national food security. In this study, the Google Earth Engine platform is used to obtain the spectral features, index features, and texture features of time-series Sentinel-1 data and Sentinel-2 data to generate the greenhouse spatial distribution maps of Shandong Province from 2019 to 2022. On this basis, a greenhouse-based agricultural development-level index is proposed for expressing and exploring the regional agricultural development level. The results of the study show that the overall accuracy of the greenhouse result map of Shandong Province is above 93%, providing a reliable foundation for subsequent analyses. Moreover, the proposed greenhouse-based agricultural development-level index, derived from greenhouse distribution data, closely aligns with the existing statistical information, effectively reflecting the regional agricultural development level.
{"title":"Exploring the Level of Agricultural Development Using Greenhouse Mapping: A Case Study of Shandong Province, China","authors":"Linye Zhu;Wenbin Sun;Deqin Fan;Huaqiao Xing;Haibo Ban","doi":"10.1109/JSTARS.2025.3531107","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531107","url":null,"abstract":"Greenhouse is a unique form of land use that plays a crucial role in agricultural production. Accurately mapping the spatial distribution of greenhouses and accurately evaluating the level of regional agricultural development are of great significance in promoting the sustainable development of agriculture and guaranteeing national food security. In this study, the Google Earth Engine platform is used to obtain the spectral features, index features, and texture features of time-series Sentinel-1 data and Sentinel-2 data to generate the greenhouse spatial distribution maps of Shandong Province from 2019 to 2022. On this basis, a greenhouse-based agricultural development-level index is proposed for expressing and exploring the regional agricultural development level. The results of the study show that the overall accuracy of the greenhouse result map of Shandong Province is above 93%, providing a reliable foundation for subsequent analyses. Moreover, the proposed greenhouse-based agricultural development-level index, derived from greenhouse distribution data, closely aligns with the existing statistical information, effectively reflecting the regional agricultural development level.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"5793-5809"},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844335","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/JSTARS.2025.3530959
Minkyung Chung;Yongil Kim
Image super-resolution (SR) aims to enhance the spatial resolution of images and overcome the hardware limitations of imaging systems. While deep-learning networks have significantly improved SR performance, obtaining paired low-resolution (LR) and high-resolution (HR) images for supervised learning remains challenging in real-world scenarios. In this article, we propose a novel unsupervised image super-resolution model for real-world remote sensing images, specifically focusing on HR satellite imagery. Our model, the bicubic-downsampled LR image-guided generative adversarial network for unsupervised learning (BLG-GAN-U), divides the SR process into two stages: LR image domain translation and image super-resolution. To implement this division, the model integrates omnidirectional real-to-synthetic domain translation with training strategies such as frequency separation and guided filtering. The model was evaluated through comparative analyses and ablation studies using real-world LR–HR datasets from WorldView-3 HR satellite imagery. The experimental results demonstrate that BLG-GAN-U effectively generates high-quality SR images with excellent perceptual quality and reasonable image fidelity, even with a relatively smaller network capacity.
{"title":"Unsupervised Image Super-Resolution for High-Resolution Satellite Imagery via Omnidirectional Real-to-Synthetic Domain Translation","authors":"Minkyung Chung;Yongil Kim","doi":"10.1109/JSTARS.2025.3530959","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3530959","url":null,"abstract":"Image super-resolution (SR) aims to enhance the spatial resolution of images and overcome the hardware limitations of imaging systems. While deep-learning networks have significantly improved SR performance, obtaining paired low-resolution (LR) and high-resolution (HR) images for supervised learning remains challenging in real-world scenarios. In this article, we propose a novel unsupervised image super-resolution model for real-world remote sensing images, specifically focusing on HR satellite imagery. Our model, the bicubic-downsampled LR image-guided generative adversarial network for unsupervised learning (BLG-GAN-U), divides the SR process into two stages: LR image domain translation and image super-resolution. To implement this division, the model integrates omnidirectional real-to-synthetic domain translation with training strategies such as frequency separation and guided filtering. The model was evaluated through comparative analyses and ablation studies using real-world LR–HR datasets from WorldView-3 HR satellite imagery. The experimental results demonstrate that BLG-GAN-U effectively generates high-quality SR images with excellent perceptual quality and reasonable image fidelity, even with a relatively smaller network capacity.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"4427-4445"},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844307","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, flooding and droughts in the Yangtze River basin have become increasingly unpredictable. Remote sensing is an effective tool for monitoring water distribution. However, cloudy weather and mountainous terrain directly affect water extraction from remote sensing images. A single data source cannot resolve this issue and often encounters the challenge of “different features having the same spectrum.” To address these problems, we constructed a dataset using both active and passive remote sensing data and designed a partitioning scheme with corresponding water body extraction rules for multiple terrains area. This partitioning method and its associated rules significantly reduce the false positive rate of water extraction in mountainous areas. Our approach successfully extracts water bodies from cloudy optical imagery without being hindered by cloud cover, thereby enhancing the usability of optical remote sensing images. The accuracy of our method reaches 91.73%, with a Kappa value of 0.90. In multiple terrains area, our method's Kappa coefficient is 0.39 higher than synthetic aperture radar and optical imagery water index and 0.06 higher than Res-U-Net. It shows superior performance and greater stability in mountainous and cloudy regions. In conclusion, this method facilitates consistent water extraction on large datasets.
{"title":"A Water Extraction Method for Multiple Terrains Area Based on Multisource Fused Images: A Case Study of the Yangtze River Basin","authors":"Huang Ruolong;Shen Qian;Fu Bolin;Yao Yue;Zhang Yuting;Du Qianyu","doi":"10.1109/JSTARS.2025.3531505","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3531505","url":null,"abstract":"In recent years, flooding and droughts in the Yangtze River basin have become increasingly unpredictable. Remote sensing is an effective tool for monitoring water distribution. However, cloudy weather and mountainous terrain directly affect water extraction from remote sensing images. A single data source cannot resolve this issue and often encounters the challenge of “different features having the same spectrum.” To address these problems, we constructed a dataset using both active and passive remote sensing data and designed a partitioning scheme with corresponding water body extraction rules for multiple terrains area. This partitioning method and its associated rules significantly reduce the false positive rate of water extraction in mountainous areas. Our approach successfully extracts water bodies from cloudy optical imagery without being hindered by cloud cover, thereby enhancing the usability of optical remote sensing images. The accuracy of our method reaches 91.73%, with a Kappa value of 0.90. In multiple terrains area, our method's Kappa coefficient is 0.39 higher than synthetic aperture radar and optical imagery water index and 0.06 higher than Res-U-Net. It shows superior performance and greater stability in mountainous and cloudy regions. In conclusion, this method facilitates consistent water extraction on large datasets.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"4964-4978"},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845130","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}