Pub Date : 2026-01-23DOI: 10.1016/j.jag.2025.105070
Jian Kang, Haiyan Guan, Dedong Zhang, Lingfei Ma, Lanying Wang, Yongtao Yu, Linlin Xu, Jonathan Li
Accurately and timely detecting road pavement damage helps monitor road deterioration extent, thereby guiding maintenance projects and ensuring traffic safety. Nevertheless, due to textural similarity and nested distribution between neighboring pavement damages, as well as the damages with the diversity sizes, irregular shapes, multiple categories, current methods have the limitation in the high-quality detection from road street-level images. To tackle these challenges, this paper develops a novel real-time anchor-free network with a one-stage processing architecture, named RPDNet, for precisely and accurately detecting pavement damages from streel-level road images. First, stacked with a layer-by-layer encoding structure boosted by a deformable fully-attentive module as the backbone extractor, the RPDNet can capture more fine-grained information and generate multiscale strong task-aware semantics, favoring significantly the discrimination noteworthy textural and geometric features. Then, by adopting a multi-level efficient aggregation neck, the RPDNet can promote informative spatial details and integrate the different-level damage encoding features, contributing to the light-weight and optimization of the whole architecture. Afterward, designed with a dual-large kernel module, embedded in a decoupled detection head with anchor-free guidance, the RPDNet can project the ranging dependency of salient and task-oriented pavement damage objects by adaptively aggregating information across large kernels in spatial-domain. Qualitative and quantitative evaluations confirmed that the RPDNet provided a promiseful solution for detecting pavement damages in industrial applications under complex street-level road conditions. Furthermore, comparative analysis with the latest anchor-based and anchor-free alternatives also proved the superiority and generalization of the RPDNet in pavement damage detection tasks. The assessment results displayed that the RPDNet obtained an average mAP@0.5, mAP@0.5:0.95, precision, and recall of 69.16%, 44.86%, 72.59%, and 60.41%, respectively, on two dataset. Additionally, we constructed a large-size multi-city road pavement damage image dataset to support urban road health monitoring.
{"title":"RPDNet: Street-level road pavement damage detection with a real-time anchor-free network","authors":"Jian Kang, Haiyan Guan, Dedong Zhang, Lingfei Ma, Lanying Wang, Yongtao Yu, Linlin Xu, Jonathan Li","doi":"10.1016/j.jag.2025.105070","DOIUrl":"https://doi.org/10.1016/j.jag.2025.105070","url":null,"abstract":"Accurately and timely detecting road pavement damage helps monitor road deterioration extent, thereby guiding maintenance projects and ensuring traffic safety. Nevertheless, due to textural similarity and nested distribution between neighboring pavement damages, as well as the damages with the diversity sizes, irregular shapes, multiple categories, current methods have the limitation in the high-quality detection from road street-level images. To tackle these challenges, this paper develops a novel real-time anchor-free network with a one-stage processing architecture, named RPDNet, for precisely and accurately detecting pavement damages from streel-level road images. First, stacked with a layer-by-layer encoding structure boosted by a deformable fully-attentive module as the backbone extractor, the RPDNet can capture more fine-grained information and generate multiscale strong task-aware semantics, favoring significantly the discrimination noteworthy textural and geometric features. Then, by adopting a multi-level efficient aggregation neck, the RPDNet can promote informative spatial details and integrate the different-level damage encoding features, contributing to the light-weight and optimization of the whole architecture. Afterward, designed with a dual-large kernel module, embedded in a decoupled detection head with anchor-free guidance, the RPDNet can project the ranging dependency of salient and task-oriented pavement damage objects by adaptively aggregating information across large kernels in spatial-domain. Qualitative and quantitative evaluations confirmed that the RPDNet provided a promiseful solution for detecting pavement damages in industrial applications under complex street-level road conditions. Furthermore, comparative analysis with the latest anchor-based and anchor-free alternatives also proved the superiority and generalization of the RPDNet in pavement damage detection tasks. The assessment results displayed that the RPDNet obtained an average <ce:italic>mAP@0.5</ce:italic>, <ce:italic>mAP@0.5:0.95</ce:italic>, <ce:italic>precision</ce:italic>, and <ce:italic>recall</ce:italic> of 69.16%, 44.86%, 72.59%, and 60.41%, respectively, on two dataset. Additionally, we constructed a large-size multi-city road pavement damage image dataset to support urban road health monitoring.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"288 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.jag.2026.105101
Jianru Yang, Hao Zheng, Weiwei Sun, Yuekai Hu, Weiguo Zhang, Chunpeng Chen, Yunxuan Zhou, Heqin Cheng, Weiming Xie, Kai Tan
Salt-marsh Fairy circles (FC) are enigmatic, quasi-circular structures linked to interacting biogeophysical processes, yet they remain difficult to detect and quantify at scale from conventional RGB imagery. Limited labeled data, transient and variable FC appearance, and severe class-imbalance make single-model machine learning (ML) unreliable for quantitative monitoring. We propose a framework for automatic FC recognition and enumeration on 3-band imagery. A zero-shot foundation model (SAM) segments images into instance-level blocks. Novel distribution-pattern and geometric features, class-equalized losses, weighted resampling, and augmentation are applied within deep-learning (U-Net, Attention-U-Net, Swin-Unet) and ensemble-learning (Random Forest, XGBoost) models. The key innovation is an imbalance-aware Bayesian method that fuses pixel-wise probabilities across models; a counting algorithm then tallies FC instances. We evaluate eight pan-sharpened scenes covering four sites along China’s coast. No individual ML model or standard Bayesian fusion is fully satisfactory. The imbalance-aware Bayesian method improves over the best single model: tight scheme: κ rises from 0.69 to 0.76, F1-score from 70.9% to 75.8% (Class 1) and from 63.5% to 68.2% (Class 2), and AUC from 84.8% to 93.1% and from 78.5% to 84.8%; loose scheme: κ increases from 0.74 to 0.79, AUC from 85.1% to 90.3%, F1-score from 74.3% to 78.6%. The counting algorithm achieves RMSE 1.62 and MAPE 0.33% over 1,135 instances, outperforming DBSCAN. A 22-month case study on Chongming Island captures marsh expansion and dieback dynamics through shifts between FC classes. Our framework delivers reliable FC recognition and enumeration on a small dataset with severe class-imbalance, generalizing across salt-marsh types.
{"title":"Recognition of salt-marsh fairy circles in conventional optical satellite imagery: A generalizable framework with multiple machine learning models and imbalanced Bayesian probability updating","authors":"Jianru Yang, Hao Zheng, Weiwei Sun, Yuekai Hu, Weiguo Zhang, Chunpeng Chen, Yunxuan Zhou, Heqin Cheng, Weiming Xie, Kai Tan","doi":"10.1016/j.jag.2026.105101","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105101","url":null,"abstract":"Salt-marsh Fairy circles (FC) are enigmatic, quasi-circular structures linked to interacting biogeophysical processes, yet they remain difficult to detect and quantify at scale from conventional RGB imagery. Limited labeled data, transient and variable FC appearance, and severe class-imbalance make single-model machine learning (ML) unreliable for quantitative monitoring. We propose a framework for automatic FC recognition and enumeration on 3-band imagery. A zero-shot foundation model (SAM) segments images into instance-level blocks. Novel distribution-pattern and geometric features, class-equalized losses, weighted resampling, and augmentation are applied within deep-learning (U-Net, Attention-U-Net, Swin-Unet) and ensemble-learning (Random Forest, XGBoost) models. The key innovation is an imbalance-aware Bayesian method that fuses pixel-wise probabilities across models; a counting algorithm then tallies FC instances. We evaluate eight pan-sharpened scenes covering four sites along China’s coast. No individual ML model or standard Bayesian fusion is fully satisfactory. The imbalance-aware Bayesian method improves over the best single model: tight scheme: κ rises from 0.69 to 0.76, F1-score from 70.9% to 75.8% (Class 1) and from 63.5% to 68.2% (Class 2), and AUC from 84.8% to 93.1% and from 78.5% to 84.8%; loose scheme: κ increases from 0.74 to 0.79, AUC from 85.1% to 90.3%, F1-score from 74.3% to 78.6%. The counting algorithm achieves RMSE 1.62 and MAPE 0.33% over 1,135 instances, outperforming DBSCAN. A 22-month case study on Chongming Island captures marsh expansion and dieback dynamics through shifts between FC classes. Our framework delivers reliable FC recognition and enumeration on a small dataset with severe class-imbalance, generalizing across salt-marsh types.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"14 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.jag.2026.105113
Boyi Li, Ce Zhang, Richard M. Timmerman, Wenxuan Bao
The emergence of vision language models (VLMs) bridges the gap between vision and language, enabling multimodal understanding beyond traditional visual-only deep learning models. However, transferring VLMs from the natural image domain to remote sensing (RS) segmentation remains challenging due to the large domain gap and the diversity of RS inputs across tasks, particularly in open-vocabulary semantic segmentation (OVSS) and referring expression segmentation (RES). Here, we propose a training-free unified framework, termed DGL-RSIS, which decouples visual and textual representations and performs visual-language alignment at both local semantic and global contextual levels. Specifically, a Global–Local Decoupling (GLD) module decomposes textual inputs into local semantic tokens and global contextual tokens, while image inputs are partitioned into class-agnostic mask proposals. Then, a Local Visual–Textual Alignment (LVTA) module adaptively extracts context-aware visual features from the mask proposals and enriches textual features through knowledge-guided prompt engineering, achieving OVSS from a local perspective. Furthermore, a Global Visual–Textual Alignment (GVTA) module employs a global-enhanced Grad-CAM mechanism to capture contextual cues for referring expressions, followed by a mask selection module that integrates pixel-level activations into mask-level segmentation outputs, thereby achieving RES from a global perspective. Experiments on the iSAID (OVSS) and RRSIS-D (RES) benchmarks demonstrate that DGL-RSIS outperforms existing training-free approaches. Ablation studies further validate the effectiveness of each module. To the best of our knowledge, this is the first unified training-free framework for RS image segmentation, which effectively transfers the semantic capability of VLMs trained on natural images to the RS domain without additional training.
{"title":"DGL-RSIS: Decoupling global spatial context and local class semantics for training-free remote sensing image segmentation","authors":"Boyi Li, Ce Zhang, Richard M. Timmerman, Wenxuan Bao","doi":"10.1016/j.jag.2026.105113","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105113","url":null,"abstract":"The emergence of vision language models (VLMs) bridges the gap between vision and language, enabling multimodal understanding beyond traditional visual-only deep learning models. However, transferring VLMs from the natural image domain to remote sensing (RS) segmentation remains challenging due to the large domain gap and the diversity of RS inputs across tasks, particularly in open-vocabulary semantic segmentation (OVSS) and referring expression segmentation (RES). Here, we propose a training-free unified framework, termed DGL-RSIS, which decouples visual and textual representations and performs visual-language alignment at both local semantic and global contextual levels. Specifically, a Global–Local Decoupling (GLD) module decomposes textual inputs into local semantic tokens and global contextual tokens, while image inputs are partitioned into class-agnostic mask proposals. Then, a Local Visual–Textual Alignment (LVTA) module adaptively extracts context-aware visual features from the mask proposals and enriches textual features through knowledge-guided prompt engineering, achieving OVSS from a local perspective. Furthermore, a Global Visual–Textual Alignment (GVTA) module employs a global-enhanced Grad-CAM mechanism to capture contextual cues for referring expressions, followed by a mask selection module that integrates pixel-level activations into mask-level segmentation outputs, thereby achieving RES from a global perspective. Experiments on the iSAID (OVSS) and RRSIS-D (RES) benchmarks demonstrate that DGL-RSIS outperforms existing training-free approaches. Ablation studies further validate the effectiveness of each module. To the best of our knowledge, this is the first unified training-free framework for RS image segmentation, which effectively transfers the semantic capability of VLMs trained on natural images to the RS domain without additional training.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"30 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.jag.2026.105123
Xiaoping Zhang, Bo Cheng, Peng Huang, Chenbin Liang, Min Zhao, Guizhou Wang, Qinxue He, Yaocan Gan
Plastic greenhouses (PGs), as a typical form of facility agriculture, play a crucial role in stabilizing agricultural production and increasing crop yields, but their rapid expansion has raised environmental concerns. Accurate long-term PGs monitoring is therefore essential for scientific agricultural regulation and environmental sustainability. However, most existing studies have focused on local regions or single-year mapping, and long-term PGs mapping remains limited. Moreover, acquiring multi-year high-quality training samples and developing effective classification algorithms remain major challenges for reliable PGs extraction. To address these issues, we propose a novel PGs mapping framework that integrates automatic sample generation with multi-temporal noise correction (MTNC), and utilizes Landsat time-series images to efficiently and accurately map multi-year PGs distribution in the Huang-Huai-Hai Plain. Specifically, high-quality training samples were automatically generated from multi-source land use/land cover and PGs products through spatial rules and sample migration, followed by preliminary classification with Random Forest. The initial predictions were then refined through the MTNC strategy, and the optimized labels were subsequently employed to train a segmentation network for robust PGs extraction. Accuracy assessments on two independent validation datasets demonstrate that the final PGs maps achieve overall accuracies above 90% and Kappa coefficients greater than 0.8 across all years. And cross-comparisons with existing PGs products at multiple spatial resolutions show a high level of spatial consistency (R2 = 0.91 with PGs-10 and 0.74 with PGs-3), further confirming the reliability of the proposed framework and the high quality of the final products.
{"title":"Long-term plastic greenhouse mapping based on automatic sample generation and multi-temporal noise correction: A case study of Huang-Huai-Hai Plain","authors":"Xiaoping Zhang, Bo Cheng, Peng Huang, Chenbin Liang, Min Zhao, Guizhou Wang, Qinxue He, Yaocan Gan","doi":"10.1016/j.jag.2026.105123","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105123","url":null,"abstract":"Plastic greenhouses (PGs), as a typical form of facility agriculture, play a crucial role in stabilizing agricultural production and increasing crop yields, but their rapid expansion has raised environmental concerns. Accurate long-term PGs monitoring is therefore essential for scientific agricultural regulation and environmental sustainability. However, most existing studies have focused on local regions or single-year mapping, and long-term PGs mapping remains limited. Moreover, acquiring multi-year high-quality training samples and developing effective classification algorithms remain major challenges for reliable PGs extraction. To address these issues, we propose a novel PGs mapping framework that integrates automatic sample generation with multi-temporal noise correction (MTNC), and utilizes Landsat time-series images to efficiently and accurately map multi-year PGs distribution in the Huang-Huai-Hai Plain. Specifically, high-quality training samples were automatically generated from multi-source land use/land cover and PGs products through spatial rules and sample migration, followed by preliminary classification with Random Forest. The initial predictions were then refined through the MTNC strategy, and the optimized labels were subsequently employed to train a segmentation network for robust PGs extraction. Accuracy assessments on two independent validation datasets demonstrate that the final PGs maps achieve overall accuracies above 90% and Kappa coefficients greater than 0.8 across all years. And cross-comparisons with existing PGs products at multiple spatial resolutions show a high level of spatial consistency (<mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math> = 0.91 with PGs-10 and 0.74 with PGs-3), further confirming the reliability of the proposed framework and the high quality of the final products.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"67 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coastal regions worldwide are increasingly threatened by Relative Sea Level Rise (RSLR), which results from the combined effects of Sea Level Rise (SLR) and Vertical Land Motion (VLM). Satellite Interferometric Synthetic Aperture Radar (InSAR) can measure millimeter-scale VLM. However, the spatiotemporal variability of VLM remains poorly quantified in RSLR estimation and projection and flood inundation assessment. This study constructed an RSLR dataset for the Bohai Rim by integrating high-resolution InSAR-derived VLM with SLR. We develop a dynamic flood inundation model by incorporating hydrological connectivity and flow path attenuation factors to improve flood risk assessment. The results show significant spatial variability of VLM and concentration on muddy and sandy coastlines. VLM dominates the spatial patterns and magnitude of RSLR. The maximum inundation extent and depth reach 17,756 km2 and 5.4 m, respectively, under the RSLR-SSP5-8.5 scenario by 2100, which threatens 10.4 million residents. The uncertainties in inundation projection can be reduced by considering the drivers and nonlinear evolution of VLM. The flood protection infrastructures can reduce inundation largely. Our findings highlight the crucial importance of incorporating VLM into coastal risk assessments and provide insights for RSLR adaptation strategies.
{"title":"Relative sea-level rise and inundation risks in the Bohai Rim: Dominant role of vertical land motion","authors":"Zhiqiang Gong, Jianzhong Wu, Jie Dong, Qianye Lan, Shangjing Lai, Jinxin Lin, Mingsheng Liao","doi":"10.1016/j.jag.2026.105115","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105115","url":null,"abstract":"Coastal regions worldwide are increasingly threatened by Relative Sea Level Rise (RSLR), which results from the combined effects of Sea Level Rise (SLR) and Vertical Land Motion (VLM). Satellite Interferometric Synthetic Aperture Radar (InSAR) can measure millimeter-scale VLM. However, the spatiotemporal variability of VLM remains poorly quantified in RSLR estimation and projection and flood inundation assessment. This study constructed an RSLR dataset for the Bohai Rim by integrating high-resolution InSAR-derived VLM with SLR. We develop a dynamic flood inundation model by incorporating hydrological connectivity and flow path attenuation factors to improve flood risk assessment. The results show significant spatial variability of VLM and concentration on muddy and sandy coastlines. VLM dominates the spatial patterns and magnitude of RSLR. The maximum inundation extent and depth reach 17,756 km<ce:sup loc=\"post\">2</ce:sup> and 5.4 m, respectively, under the RSLR-SSP5-8.5 scenario by 2100, which threatens 10.4 million residents. The uncertainties in inundation projection can be reduced by considering the drivers and nonlinear evolution of VLM. The flood protection infrastructures can reduce inundation largely. Our findings highlight the crucial importance of incorporating VLM into coastal risk assessments and provide insights for RSLR adaptation strategies.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"21 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.jag.2026.105085
Jiang He, Xiao Xiang Zhu
Accurate pan-sharpening of multispectral images is essential for high-resolution remote sensing, yet supervised methods are limited by the need for paired training data and poor generalization. Existing unsupervised approaches often neglect the physical consistency between degradation and fusion and lack sufficient constraints, resulting in suboptimal performance in complex scenarios. We propose RevFus, a novel two-stage pan-sharpening framework. In the first stage, an invertible neural network models the degradation process and reverses it for fusion with cycle-consistency self-learning, ensuring a physically grounded mapping. In the second stage, structural detail compensation and spatial–spectral contrastive learning alleviate detail loss and enhance spectral–spatial fidelity. To further understand the network’s decision-making, we design a quantitative and systematic measure of model interpretability, the Interpretability Efficacy Coefficient (IEC). IEC integrates multiple statistics derived from SHapley Additive exPlanations (SHAP) values into a single unified score and try to evaluate how effectively a model balances spatial detail enhancement with spectral preservation. Experiments on three datasets demonstrate that RevFus outperforms state-of-the-art unsupervised and traditional methods, delivering superior spectral fidelity, enhanced spatial detail, and high model interpretability, thereby validating the effectiveness of the interpretable deep learning framework for robust, high-quality pan-sharpening.
{"title":"Reverse degradation for remote sensing pan-sharpening","authors":"Jiang He, Xiao Xiang Zhu","doi":"10.1016/j.jag.2026.105085","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105085","url":null,"abstract":"Accurate pan-sharpening of multispectral images is essential for high-resolution remote sensing, yet supervised methods are limited by the need for paired training data and poor generalization. Existing unsupervised approaches often neglect the physical consistency between degradation and fusion and lack sufficient constraints, resulting in suboptimal performance in complex scenarios. We propose RevFus, a novel two-stage pan-sharpening framework. In the first stage, an invertible neural network models the degradation process and reverses it for fusion with cycle-consistency self-learning, ensuring a physically grounded mapping. In the second stage, structural detail compensation and spatial–spectral contrastive learning alleviate detail loss and enhance spectral–spatial fidelity. To further understand the network’s decision-making, we design a quantitative and systematic measure of model interpretability, the Interpretability Efficacy Coefficient (IEC). IEC integrates multiple statistics derived from SHapley Additive exPlanations (SHAP) values into a single unified score and try to evaluate how effectively a model balances spatial detail enhancement with spectral preservation. Experiments on three datasets demonstrate that RevFus outperforms state-of-the-art unsupervised and traditional methods, delivering superior spectral fidelity, enhanced spatial detail, and high model interpretability, thereby validating the effectiveness of the interpretable deep learning framework for robust, high-quality pan-sharpening.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"395 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.jag.2026.105116
Bo Liu, Deren Li, Xiongwu Xiao, Zhenfeng Shao, Yingbing Li, Haobin Zhang, Yunong Chen, Zhenbei Zhang, Siyuan Wang, Boshen Chang
Landslides are among the most frequent natural hazards worldwide, posing severe risks to human lives and assets. Sentinel-2 imagery, a widely available remote sensing resource, has become an essential tool for detecting landslide changes. Nevertheless, its moderate spatial resolution often causes the loss of subtle surface information, which limits the precision of landslide recognition. To overcome this limitation, this study develops a Dynamic Multi-branch Landslide Change Detection Network (DMLCDNet) that integrates three types of multimodal data: multispectral, deformation, and topographic information. The proposed model achieves efficient landslide extraction through four key stages: preliminary change detection, dynamic feature extraction, feature fusion optimization, and spatial-semantic restoration. Experiments were conducted on three representative landslide events—Jiuzhaigou, Luding, and Longyan. The experimental findings indicate that DMLCDNet achieves improvements of 3.61% in F1-score and 6.91% in IoU over all baseline models, while sustaining efficient inference performance. In addition, evaluations on two other landslide cases confirm the model’s robust generalization ability. The study further emphasizes that the optimal selection of input data should follow three guiding principles: rich information, high discriminability, and strong relevance. Overall, the proposed method provides an effective paradigm for multisource feature fusion in Sentinel-2–based landslide change detection, offering substantial theoretical significance and practical value. The source code and dataset are publicly available at https://github.com/Trifurs/DMLCDNet.
{"title":"Enhancing Sentinel-2 landslide change detection by integrating multispectral, deformation, and topographic information","authors":"Bo Liu, Deren Li, Xiongwu Xiao, Zhenfeng Shao, Yingbing Li, Haobin Zhang, Yunong Chen, Zhenbei Zhang, Siyuan Wang, Boshen Chang","doi":"10.1016/j.jag.2026.105116","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105116","url":null,"abstract":"Landslides are among the most frequent natural hazards worldwide, posing severe risks to human lives and assets. Sentinel-2 imagery, a widely available remote sensing resource, has become an essential tool for detecting landslide changes. Nevertheless, its moderate spatial resolution often causes the loss of subtle surface information, which limits the precision of landslide recognition. To overcome this limitation, this study develops a Dynamic Multi-branch Landslide Change Detection Network (DMLCDNet) that integrates three types of multimodal data: multispectral, deformation, and topographic information. The proposed model achieves efficient landslide extraction through four key stages: preliminary change detection, dynamic feature extraction, feature fusion optimization, and spatial-semantic restoration. Experiments were conducted on three representative landslide events—Jiuzhaigou, Luding, and Longyan. The experimental findings indicate that DMLCDNet achieves improvements of 3.61% in F1-score and 6.91% in IoU over all baseline models, while sustaining efficient inference performance. In addition, evaluations on two other landslide cases confirm the model’s robust generalization ability. The study further emphasizes that the optimal selection of input data should follow three guiding principles: rich information, high discriminability, and strong relevance. Overall, the proposed method provides an effective paradigm for multisource feature fusion in Sentinel-2–based landslide change detection, offering substantial theoretical significance and practical value. The source code and dataset are publicly available at https://github.com/Trifurs/DMLCDNet.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"7 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.jag.2026.105114
Xiaoyan Lu, Qihao Weng
Optical and Synthetic Aperture Radar (SAR) images are widely used in land cover mapping, environmental monitoring, and disaster management. While optical sensors are often hindered by adverse weather conditions such as clouds and rain, SAR provides all-weather capability but suffers from lower interpretability. To overcome this limitation, historical optical data can be leveraged to improve SAR interpretation and enhance mapping accuracy. This study proposed a degradable multimodal fusion (DEMFuse) framework that leverages historical optical data to enhance the interpretation of single SAR imagery and implemented it via semantic segmentation. The DEMFuse is composed of a SAR-to-optical image generator and a degradable fusion model. The former was built upon an ImageNet-pretrained transformer that employed a non-local generative modeling for translating SAR images into optical images to recover optical visual structural information. The latter was proposed to achieve progressively enhanced fusion performance by continuously assimilating useful information from synthesized optical data based on SAR data, thus reducing the impact of artifacts in the synthesized data, and ensuring the effective fusion of the optical and SAR data. To demonstrate the effectiveness of the proposed DEMFuse framework, a globally distributed SAR-to-optical image translation dataset and a land-cover semantic segmentation dataset were constructed by using 18,071 Sentinel-1 SAR and Sentinel-2 optical images, together referred to as “IT-SS-18 K” (image translation-segmentation-18 K). Additionally, a SAR-based flood rapid mapping dataset was employed to validate the effectiveness of the proposed framework in disaster scenarios. Experiments on IT-SS-18 K demonstrated that the proposed DEMFuse framework significantly reduced the relative accuracy gap by over 30 % between SAR and optical data for the segmentation of water, built areas, and trees. In the flood disaster scenario, the proposed framework yielded a 1.76 % improvement compared to the single SAR segmentation. This finding suggests that DEMFuse can be employed to enhance SAR interpretation in scenarios where optical data is missing, and improve the efficiency and accuracy of disaster management and environmental monitoring. The datasets will be made available at https://github.com/RCAIG/DEMFuse.
光学和合成孔径雷达(SAR)图像广泛应用于土地覆盖制图、环境监测和灾害管理等领域。虽然光学传感器经常受到恶劣天气条件(如云和雨)的阻碍,但SAR提供全天候能力,但可解释性较低。为了克服这一限制,可以利用历史光学数据来改进SAR解释并提高制图精度。本研究提出了一种可降解的多模态融合(DEMFuse)框架,该框架利用历史光学数据增强对单个SAR图像的解释,并通过语义分割实现。DEMFuse由SAR-to-optical图像发生器和可降解融合模型组成。前者建立在imagenet预训练的转换器上,该转换器采用非局部生成建模将SAR图像转换为光学图像,以恢复光学视觉结构信息。基于SAR数据不断吸收合成光学数据中的有用信息,从而减少合成数据中伪影的影响,保证光学数据与SAR数据的有效融合,从而逐步增强融合性能。为了验证所提出的DEMFuse框架的有效性,利用18071张Sentinel-1 SAR和Sentinel-2光学图像构建了全球分布式SAR到光学图像转换数据集和土地覆盖语义分割数据集,统称为“it - ss - 18k”(图像平移-分割- 18k)。此外,利用基于sar的洪水快速制图数据集验证了该框架在灾害场景中的有效性。在it - ss - 18k上的实验表明,所提出的DEMFuse框架显著降低了SAR和光学数据在分割水体、建筑区域和树木方面的相对精度差距,降幅超过30%。在洪水灾害场景中,与单一SAR分割相比,所提出的框架产生了1.76%的改进。这一发现表明,DEMFuse可以在光学数据缺失的情况下增强SAR解释,提高灾害管理和环境监测的效率和准确性。这些数据集将在https://github.com/RCAIG/DEMFuse上提供。
{"title":"Semantic segmentation of single SAR imagery leveraging historical Sentinel-1 and Sentinel-2 data","authors":"Xiaoyan Lu, Qihao Weng","doi":"10.1016/j.jag.2026.105114","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105114","url":null,"abstract":"Optical and Synthetic Aperture Radar (SAR) images are widely used in land cover mapping, environmental monitoring, and disaster management. While optical sensors are often hindered by adverse weather conditions such as clouds and rain, SAR provides all-weather capability but suffers from lower interpretability. To overcome this limitation, historical optical data can be leveraged to improve SAR interpretation and enhance mapping accuracy. This study proposed a degradable multimodal fusion (DEMFuse) framework that leverages historical optical data to enhance the interpretation of single SAR imagery and implemented it via semantic segmentation. The DEMFuse is composed of a SAR-to-optical image generator and a degradable fusion model. The former was built upon an ImageNet-pretrained transformer that employed a non-local generative modeling for translating SAR images into optical images to recover optical visual structural information. The latter was proposed to achieve progressively enhanced fusion performance by continuously assimilating useful information from synthesized optical data based on SAR data, thus reducing the impact of artifacts in the synthesized data, and ensuring the effective fusion of the optical and SAR data. To demonstrate the effectiveness of the proposed DEMFuse framework, a globally distributed SAR-to-optical image translation dataset and a land-cover semantic segmentation dataset were constructed by using 18,071 Sentinel-1 SAR and Sentinel-2 optical images, together referred to as “IT-SS-18 K” (image translation-segmentation-18 K). Additionally, a SAR-based flood rapid mapping dataset was employed to validate the effectiveness of the proposed framework in disaster scenarios. Experiments on IT-SS-18 K demonstrated that the proposed DEMFuse framework significantly reduced the relative accuracy gap by over 30 % between SAR and optical data for the segmentation of water, built areas, and trees. In the flood disaster scenario, the proposed framework yielded a 1.76 % improvement compared to the single SAR segmentation. This finding suggests that DEMFuse can be employed to enhance SAR interpretation in scenarios where optical data is missing, and improve the efficiency and accuracy of disaster management and environmental monitoring. The datasets will be made available at <ce:inter-ref xlink:href=\"https://github.com/RCAIG/DEMFuse\" xlink:type=\"simple\">https://github.com/RCAIG/DEMFuse</ce:inter-ref>.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"40 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mitigating pollutants and greenhouse gas (GHG) emissions is a core strategic imperative for China, necessitating integrated technological solutions. Satellite remote sensing (SRS) has emerged as a pivotal tool for environmental monitoring, enhancing our understanding of pollutants and GHG emissions from space and supports policy-making. Here we summarize the recent research advances regarding the role of SRS in monitoring pollutants and GHG emissions in China, focusing on three key applications: 1) pollutant emission tracking, 2) carbon emission quantification, and 3) synergistic monitoring of pollution-carbon interactions. Current analysis identifies substantial progress achieved over the past decades, but critical limitations remain. From a scientific perspective, both ground observation networks and professional sensors are lacking for improving retrieval accuracy. Moreover, artificial intelligence (AI)-based data-mining theory is inadequate. From a practical perspective, high-resolution satellite provides insufficient coverage for nationwide carbon monitoring and data deficiencies hinders intelligent monitoring systems. To address these challenges, we provide the following strategic priorities: 1) establishing integrated Space-Air-Ground observation systems, 2) accelerating the deployment of next-generation monitoring satellites and developing related retrieval algorithms, and 3) developing AI-based monitoring systems through multisource data integration. This review enhances the understanding of SRS applications and provides directions for future research aimed at mitigating pollutant and GHG emissions, thereby supporting China’s dual carbon goals and sustainable development worldwide.
{"title":"A review of satellite remote sensing for pollution control and carbon reduction in China","authors":"Shaohua Zhao, Yipeng Yang, Lin Ma, Zhijie Bai, Youcan Feng, Quanhai Liu, Qiao Wang, Yunhan Chen, Fei Wang, Jiahua Teng, Linbo Zhao, Yuhao Xie, Yazhen Dai, Le Yu, Yanqin Zhou, Yujiu Xiong","doi":"10.1016/j.jag.2026.105109","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105109","url":null,"abstract":"Mitigating pollutants and greenhouse gas (GHG) emissions is a core strategic imperative for China, necessitating integrated technological solutions. Satellite remote sensing (SRS) has emerged as a pivotal tool for environmental monitoring, enhancing our understanding of pollutants and GHG emissions from space and supports policy-making. Here we summarize the recent research advances regarding the role of SRS in monitoring pollutants and GHG emissions in China, focusing on three key applications: 1) pollutant emission tracking, 2) carbon emission quantification, and 3) synergistic monitoring of pollution-carbon interactions. Current analysis identifies substantial progress achieved over the past decades, but critical limitations remain. From a scientific perspective, both ground observation networks and professional sensors are lacking for improving retrieval accuracy. Moreover, artificial intelligence (AI)-based data-mining theory is inadequate. From a practical perspective, high-resolution satellite provides insufficient coverage for nationwide carbon monitoring and data deficiencies hinders intelligent monitoring systems. To address these challenges, we provide the following strategic priorities: 1) establishing integrated Space-Air-Ground observation systems, 2) accelerating the deployment of next-generation monitoring satellites and developing related retrieval algorithms, and 3) developing AI-based monitoring systems through multisource data integration. This review enhances the understanding of SRS applications and provides directions for future research aimed at mitigating pollutant and GHG emissions, thereby supporting China’s dual carbon goals and sustainable development worldwide.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"18 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.jag.2026.105111
Ming Li, Dazhao Fan, Yang Dong, Song Ji, Dongzi Li, Jiaqi Yang, Aosheng Wang, Baosheng Zhang
Obtaining accurate geometric positioning information for images is crucial for the application of remote sensing images in stereo mapping, map production, and target positioning. However, for optical remote sensing satellite images and archived historical images, the geometric positioning methods vary, and different control data formats are needed. This limits the timeliness of applying optical remote sensing images and has become a bottleneck for the unified and standardized processing of such images. In response, this paper presents a framework for optimizing the geometric positioning accuracy of optical remote sensing images. A lightweight vector control library (LVCL) is constructed to serve as the control data for the geometric positions of the optical images, and a rough positioning method based on a mutual feedback constraint-matching strategy and multiple associated images is proposed to obtain approximate positions of the target images quickly. Moreover, optimal iteration and segmented connection based vector matching methods are proposed to accurately match the target image to images in the LVCL, thereby providing the target image’s geographical information based on the control data. Experiments conducted across multiple regions show that this framework can effectively match remote sensing images with the geographic coordinates in the LVCL: the data storage volume is only 1/10 to 1/3893 of that of traditional and existing lightweight methods, demonstrating prominent lightweight advantages; among the images from 13 cities, the correct rough positioning matching results for 12 cities rank among the top 10, with 7 achieving the first place; the number of vector construction nodes in 62 regions is better than that of the comparison strategy; the vector matching results are correct and the number of matching points is the largest; for the 13 regions, the average positioning deviation decreases from 156.590 m to 4.098 m, and the maximum deviation decreases from 582.142 m to 10.588 m, with systematic errors significantly corrected. Thus, the proposed framework is expected to become a new paradigm for obtaining the geometric positions of optical remote sensing images.
{"title":"A new optimization framework for the geometric positioning accuracy of optical images","authors":"Ming Li, Dazhao Fan, Yang Dong, Song Ji, Dongzi Li, Jiaqi Yang, Aosheng Wang, Baosheng Zhang","doi":"10.1016/j.jag.2026.105111","DOIUrl":"https://doi.org/10.1016/j.jag.2026.105111","url":null,"abstract":"Obtaining accurate geometric positioning information for images is crucial for the application of remote sensing images in stereo mapping, map production, and target positioning. However, for optical remote sensing satellite images and archived historical images, the geometric positioning methods vary, and different control data formats are needed. This limits the timeliness of applying optical remote sensing images and has become a bottleneck for the unified and standardized processing of such images. In response, this paper presents a framework for optimizing the geometric positioning accuracy of optical remote sensing images. A lightweight vector control library (LVCL) is constructed to serve as the control data for the geometric positions of the optical images, and a rough positioning method based on a mutual feedback constraint-matching strategy and multiple associated images is proposed to obtain approximate positions of the target images quickly. Moreover, optimal iteration and segmented connection based vector matching methods are proposed to accurately match the target image to images in the LVCL, thereby providing the target image’s geographical information based on the control data. Experiments conducted across multiple regions show that this framework can effectively match remote sensing images with the geographic coordinates in the LVCL: the data storage volume is only 1/10 to 1/3893 of that of traditional and existing lightweight methods, demonstrating prominent lightweight advantages; among the images from 13 cities, the correct rough positioning matching results for 12 cities rank among the top 10, with 7 achieving the first place; the number of vector construction nodes in 62 regions is better than that of the comparison strategy; the vector matching results are correct and the number of matching points is the largest; for the 13 regions, the average positioning deviation decreases from 156.590 m to 4.098 m, and the maximum deviation decreases from 582.142 m to 10.588 m, with systematic errors significantly corrected. Thus, the proposed framework is expected to become a new paradigm for obtaining the geometric positions of optical remote sensing images.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"100 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}