Pub Date : 2025-12-09DOI: 10.1109/LGRS.2025.3640124
Zhanyuan Liang;Xiaoyu Zhang;Guoqiang Shen;Zhentao Wang;Xiping Wang
Reflection waveform inversion (RWI) updates the low- to mid-wavenumber components of the velocity model accurately by projecting the waveform errors between the observed and synthetic data onto the reflection wave paths. However, the synthetic data, generated with the aid of migration/demigration, exhibit unexpected waveform deviations from the observed data due to unknown source wavelets, potentially interfering with inversion outcomes. To address this issue, we propose a source-independent RWI (SI-RWI) method. Initially, the equivalent source spectrum of the migration/demigration process is derived in the frequency domain. Subsequently, the misfit function for RWI is designed to ensure that the observed and synthetic data share the same equivalent source spectrum. Finally, based on this novel misfit function, an RWI method is formulated that does not rely on the phase distortions of source wavelets. The proposed approach has been demonstrated successfully using 2-D examples.
{"title":"Source Independent Reflection Waveform Inversion","authors":"Zhanyuan Liang;Xiaoyu Zhang;Guoqiang Shen;Zhentao Wang;Xiping Wang","doi":"10.1109/LGRS.2025.3640124","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3640124","url":null,"abstract":"Reflection waveform inversion (RWI) updates the low- to mid-wavenumber components of the velocity model accurately by projecting the waveform errors between the observed and synthetic data onto the reflection wave paths. However, the synthetic data, generated with the aid of migration/demigration, exhibit unexpected waveform deviations from the observed data due to unknown source wavelets, potentially interfering with inversion outcomes. To address this issue, we propose a source-independent RWI (SI-RWI) method. Initially, the equivalent source spectrum of the migration/demigration process is derived in the frequency domain. Subsequently, the misfit function for RWI is designed to ensure that the observed and synthetic data share the same equivalent source spectrum. Finally, based on this novel misfit function, an RWI method is formulated that does not rely on the phase distortions of source wavelets. The proposed approach has been demonstrated successfully using 2-D examples.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As oil and gas explorations progressively advance toward deeper and more complex geological formations, the imperative for precise characterization of subsurface structures has become increasingly prominent. The efficacy of noise suppression is a critical determinant for the quality of subsequent inversion and imaging processes. In recent years, deep learning methodologies have garnered significant attention and widespread application in seismic denoising, primarily due to their inherent data-driven advantages. While conventional deep learning implementations have achieved notable denoising performance, they are confronted with inherent limitations, including incomplete noise reduction and potential signal degradation. To address these challenges, this study proposes an innovative multiround SCU-Net (MR-SCU) denoising approach. The MR-SCU methodology based on SCU-Net employs noise as labeled data to generate an initial denoised outcome in the first round. Denoising results are used as input while utilizing the residuals between the labeled and predicted data as labels for subsequent denoising round. Multiple rounds are iteratively repeated to achieve more thorough denoising effect while preserving effective signals from being compromised. The incorporation of SSIM (structural similarity index measure) as the loss function further enhances the method’s precision in detail-oriented denoising tasks. Numerical experiments conducted on synthetic data and field data acquired from a specific region in western China substantiate the efficacy of the MR-SCU, demonstrating its capability to deliver superior denoising performance while optimally preserve valuable seismic information.
随着油气勘探逐渐向更深、更复杂的地质构造推进,精确表征地下构造的必要性日益突出。噪声抑制的有效性是后续反演和成像过程质量的关键决定因素。近年来,深度学习方法由于其固有的数据驱动优势,在地震去噪中得到了广泛的关注和应用。虽然传统的深度学习实现已经取得了显著的去噪性能,但它们面临着固有的局限性,包括不完全的降噪和潜在的信号退化。为了应对这些挑战,本研究提出了一种创新的多轮SCU-Net (MR-SCU)去噪方法。基于SCU-Net的MR-SCU方法使用噪声作为标记数据,在第一轮中生成初始去噪结果。去噪结果用作输入,同时利用标记数据和预测数据之间的残差作为后续去噪轮的标签。多轮迭代重复,实现更彻底的去噪效果,同时保持有效信号不被破坏。将SSIM (structural similarity index measure)作为损失函数,进一步提高了该方法在面向细节的去噪任务中的精度。对中国西部特定地区的合成数据和现场数据进行的数值实验证实了MR-SCU的有效性,证明了它能够提供优越的去噪性能,同时最佳地保留有价值的地震信息。
{"title":"Seismic Denoising via Multiround SCU-Net","authors":"Yuli Qi;Guoxin Chen;Jinxin Chen;Jun Li;Rongsen Du;Haiyang Lu;Naijian Wang;Xingguo Huang","doi":"10.1109/LGRS.2025.3640968","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3640968","url":null,"abstract":"As oil and gas explorations progressively advance toward deeper and more complex geological formations, the imperative for precise characterization of subsurface structures has become increasingly prominent. The efficacy of noise suppression is a critical determinant for the quality of subsequent inversion and imaging processes. In recent years, deep learning methodologies have garnered significant attention and widespread application in seismic denoising, primarily due to their inherent data-driven advantages. While conventional deep learning implementations have achieved notable denoising performance, they are confronted with inherent limitations, including incomplete noise reduction and potential signal degradation. To address these challenges, this study proposes an innovative multiround SCU-Net (MR-SCU) denoising approach. The MR-SCU methodology based on SCU-Net employs noise as labeled data to generate an initial denoised outcome in the first round. Denoising results are used as input while utilizing the residuals between the labeled and predicted data as labels for subsequent denoising round. Multiple rounds are iteratively repeated to achieve more thorough denoising effect while preserving effective signals from being compromised. The incorporation of SSIM (structural similarity index measure) as the loss function further enhances the method’s precision in detail-oriented denoising tasks. Numerical experiments conducted on synthetic data and field data acquired from a specific region in western China substantiate the efficacy of the MR-SCU, demonstrating its capability to deliver superior denoising performance while optimally preserve valuable seismic information.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1109/LGRS.2025.3639147
Guoyu Zhou;Jing Zhang;Yi Yan;Hui Zhang;Li Zhuo
Accurate semantic segmentation of urban remote sensing images (URSIs) is essential for urban planning and environmental monitoring. However, it remains challenging due to the subtle texture differences and similar spatial structures among geospatial objects, which cause semantic ambiguity and misclassification. Additional complexities arise from irregular object shapes, blurred boundaries, and overlapping spatial distributions of objects, resulting in diverse and intricate edge morphologies. To address these issues, we propose TEFormer, a texture-aware and edge-guided Transformer. Our model features a texture-aware module (TaM) in the encoder to capture fine-grained texture distinctions between visually similar categories, thereby enhancing semantic discrimination. The decoder incorporates an edge-guided tri-branch decoder (Eg3Head) to preserve local edges and details while maintaining multiscale context-awareness. Finally, an edge-guided feature fusion module (EgFFM) effectively integrates contextual, detail, and edge information to achieve refined semantic segmentation. Extensive evaluation demonstrates that TEFormer yields mean intersection over union (mIoU) scores of 88.57% on Potsdam and 81.46% on Vaihingen, exceeding the next best methods by 0.73% and 0.22%. On the LoveDA dataset, it secures the second position with an overall mIoU of 53.55%, trailing the optimal performance by a narrow margin of 0.19%.
{"title":"TEFormer: Texture-Aware and Edge-Guided Transformer for Semantic Segmentation of Urban Remote Sensing Images","authors":"Guoyu Zhou;Jing Zhang;Yi Yan;Hui Zhang;Li Zhuo","doi":"10.1109/LGRS.2025.3639147","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3639147","url":null,"abstract":"Accurate semantic segmentation of urban remote sensing images (URSIs) is essential for urban planning and environmental monitoring. However, it remains challenging due to the subtle texture differences and similar spatial structures among geospatial objects, which cause semantic ambiguity and misclassification. Additional complexities arise from irregular object shapes, blurred boundaries, and overlapping spatial distributions of objects, resulting in diverse and intricate edge morphologies. To address these issues, we propose TEFormer, a texture-aware and edge-guided Transformer. Our model features a texture-aware module (TaM) in the encoder to capture fine-grained texture distinctions between visually similar categories, thereby enhancing semantic discrimination. The decoder incorporates an edge-guided tri-branch decoder (Eg3Head) to preserve local edges and details while maintaining multiscale context-awareness. Finally, an edge-guided feature fusion module (EgFFM) effectively integrates contextual, detail, and edge information to achieve refined semantic segmentation. Extensive evaluation demonstrates that TEFormer yields mean intersection over union (mIoU) scores of 88.57% on Potsdam and 81.46% on Vaihingen, exceeding the next best methods by 0.73% and 0.22%. On the LoveDA dataset, it secures the second position with an overall mIoU of 53.55%, trailing the optimal performance by a narrow margin of 0.19%.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1109/LGRS.2025.3639172
Fernando G. Marques;Carlos A. Astudillo;Alan Souza;Daniel Miranda;Edson Borin
We apply the self-supervised learning (SSL) technique of vision transformers masked autoencoder (ViTs MAE) models with the goal of producing a feature extractor ViT backbone for neural networks that receive seismic data as an input. We then evaluate the quality of these backbones by coupling them to a simple linear prediction head and fine-tuning these models in a seismic semantic segmentation task. We compare domain-specific ViT MAE against cross-domain pretrained and randomly initialized ViTs and show that it yields superior performance in low-data regimes. We also demonstrate that pretraining loss correlates with downstream performance, supporting its use as a proxy for feature quality.
{"title":"Applying ViT Masked Autoencoders to Seismic Data for Feature Extraction and Few-Shot Learning","authors":"Fernando G. Marques;Carlos A. Astudillo;Alan Souza;Daniel Miranda;Edson Borin","doi":"10.1109/LGRS.2025.3639172","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3639172","url":null,"abstract":"We apply the self-supervised learning (SSL) technique of vision transformers masked autoencoder (ViTs MAE) models with the goal of producing a feature extractor ViT backbone for neural networks that receive seismic data as an input. We then evaluate the quality of these backbones by coupling them to a simple linear prediction head and fine-tuning these models in a seismic semantic segmentation task. We compare domain-specific ViT MAE against cross-domain pretrained and randomly initialized ViTs and show that it yields superior performance in low-data regimes. We also demonstrate that pretraining loss correlates with downstream performance, supporting its use as a proxy for feature quality.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1109/LGRS.2025.3637108
Fang Ouyang;Jianguo Zhao;Xinze Liu;Bin Wang;Yu Zhang;Bohong Yan
Rock-physics theories and experiments have demonstrated that seismic wave velocity dispersion and attenuation are closely related to hydrocarbon deposits. To obtain the velocity at different seismic frequencies, the frequency-dependent amplitude variation with angle (AVA) inversion method has been developed to invert the dispersive velocity from frequency-domain P-wave reflection coefficients. Such a method can overcome the disability of the conventional AVA inversion in terms of seismic dispersion. However, the limitation is that only velocity dispersion is considered while the effects of seismic attenuation are neglected. In this letter, we proposed a new frequency-dependent AVA method, in which the thickness of reservoir and the complex dispersive P-wave velocity that includes the information of both dispersion and attenuation are simultaneously inverted. To better catch the characteristics of the reflections and transmissions between layers, the reflectivity method is adopted as the forward modeling engine. Furthermore, a modified simulated annealing method that takes advantages of the parameter-by-parameter optimization idea in heat-bath algorithm as well as the acceptance criteria used in Metropolis algorithm is developed, so as to achieve efficient and better global optimization for the complex inversion problem of high-degree nonlinearity and ill-posedness. Compared with previous frequency-dependent AVA methods, our improved approach can not only predict the P-wave velocity dispersion but also the frequency-dependent inverse quality factor of the reservoir layer. Using synthetic records and field data through a drilling well, the effectiveness and applicability of the proposed method in hydrocarbon indication are verified.
{"title":"Stochastic Frequency-Dependent Velocity and Attenuation Inversion for Hydrocarbon Detection","authors":"Fang Ouyang;Jianguo Zhao;Xinze Liu;Bin Wang;Yu Zhang;Bohong Yan","doi":"10.1109/LGRS.2025.3637108","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3637108","url":null,"abstract":"Rock-physics theories and experiments have demonstrated that seismic wave velocity dispersion and attenuation are closely related to hydrocarbon deposits. To obtain the velocity at different seismic frequencies, the frequency-dependent amplitude variation with angle (AVA) inversion method has been developed to invert the dispersive velocity from frequency-domain P-wave reflection coefficients. Such a method can overcome the disability of the conventional AVA inversion in terms of seismic dispersion. However, the limitation is that only velocity dispersion is considered while the effects of seismic attenuation are neglected. In this letter, we proposed a new frequency-dependent AVA method, in which the thickness of reservoir and the complex dispersive P-wave velocity that includes the information of both dispersion and attenuation are simultaneously inverted. To better catch the characteristics of the reflections and transmissions between layers, the reflectivity method is adopted as the forward modeling engine. Furthermore, a modified simulated annealing method that takes advantages of the parameter-by-parameter optimization idea in heat-bath algorithm as well as the acceptance criteria used in Metropolis algorithm is developed, so as to achieve efficient and better global optimization for the complex inversion problem of high-degree nonlinearity and ill-posedness. Compared with previous frequency-dependent AVA methods, our improved approach can not only predict the P-wave velocity dispersion but also the frequency-dependent inverse quality factor of the reservoir layer. Using synthetic records and field data through a drilling well, the effectiveness and applicability of the proposed method in hydrocarbon indication are verified.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1109/LGRS.2025.3636279
Yier Yan;Zhibin Liang;Changhong Liu;Tao Zou
With the rapid development of uncrewed aerial vehicle (UAV) technology, UAVs have provided an innovative solution for floating debris monitoring. However, object detection in UAV images remains challenging due to high miss rates for small objects, insufficient low-level feature extraction, and computational redundancy. This letter proposes an efficient floating debris detection model based on YOLOv8n, named EFD-you only look once (YOLO), to address these issues. First, the edge fusion stem (EFStem) module is proposed to enhance low-level feature extraction through an integrated gate-attention mechanism. Second, the multibranch efficient reparameterization block (MBERB) is designed to achieve efficient cross-layer feature fusion. Experimental results demonstrate that compared to YOLOv8n, our model achieves a 6.3% improvement in mean average precision (mAP) on the UAV floating debris dataset, while simultaneously reducing parameters by 26.7% and improving small object recall by 21.9%. The inference time of EFD-YOLO on the RK3588 edge device is as low as 30.5 ms, demonstrating real-time capability.
随着无人飞行器(UAV)技术的快速发展,无人机为浮物监测提供了创新的解决方案。然而,无人机图像中的目标检测仍然具有挑战性,因为小目标的高缺失率、低层次特征提取不足和计算冗余。为了解决这些问题,本信函提出了一种基于YOLOv8n的高效漂浮碎片检测模型,称为EFD-you only look once (YOLO)。首先,提出边缘融合干(EFStem)模块,通过集成门-注意机制增强底层特征提取;其次,设计多分支高效重参数化块(MBERB),实现高效的跨层特征融合;实验结果表明,与YOLOv8n相比,我们的模型在无人机漂浮碎片数据集上的平均精度(mAP)提高了6.3%,同时参数减少了26.7%,小目标召回率提高了21.9%。EFD-YOLO在RK3588边缘器件上的推理时间低至30.5 ms,显示出实时性。
{"title":"EFD-YOLO: An Improved YOLOv8 Network for River Floating Debris Object Detection","authors":"Yier Yan;Zhibin Liang;Changhong Liu;Tao Zou","doi":"10.1109/LGRS.2025.3636279","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3636279","url":null,"abstract":"With the rapid development of uncrewed aerial vehicle (UAV) technology, UAVs have provided an innovative solution for floating debris monitoring. However, object detection in UAV images remains challenging due to high miss rates for small objects, insufficient low-level feature extraction, and computational redundancy. This letter proposes an efficient floating debris detection model based on YOLOv8n, named EFD-you only look once (YOLO), to address these issues. First, the edge fusion stem (EFStem) module is proposed to enhance low-level feature extraction through an integrated gate-attention mechanism. Second, the multibranch efficient reparameterization block (MBERB) is designed to achieve efficient cross-layer feature fusion. Experimental results demonstrate that compared to YOLOv8n, our model achieves a 6.3% improvement in mean average precision (mAP) on the UAV floating debris dataset, while simultaneously reducing parameters by 26.7% and improving small object recall by 21.9%. The inference time of EFD-YOLO on the RK3588 edge device is as low as 30.5 ms, demonstrating real-time capability.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To be effective, ecosystem and habitat conservation must not only look at past losses but also understand the effects of current and future decisions on landscapes. Here, we present a transformative, user-driven land cover change prediction tool designed to aid land planners in strategic decision-making for conservation and habitat protection. Within an integrated map-based prediction pipeline, the tool uses machine learning (ML) and deep learning (DL) models to classify satellite images and make predictions of near-term land cover changes. The tool facilitates user interaction with a cloud-hosted ML model, making it accessible to nontechnical users for generating map-based predictions using big data. The tool’s key strength lies in its dynamic variable adjustment feature, empowering users to tailor scenarios related to potential future development planning. Through the integration of cloud-hosted ML and DL models with a user-centric interface, the tool has the potential to allow stakeholders and land planners to make informed decisions, actively minimizing habitat destruction and aligning with broader conservation objectives. We tested our approach in the context of central Texas, USA to evaluate its effectiveness in diverse conservation scenarios, with an average overall accuracy of 88% for the land cover class maps over four years and over 72% for the five-year land cover change prediction. While our approach has the potential to improve land management and planning for conservation, we also acknowledge the importance of rigorous model validation and ongoing refinement and highlight the need for technological advancement to be developed with strong stakeholder engagement.
{"title":"User-Driven Land Cover Change Prediction Map Tool for Land Conservation Planning","authors":"Pui-Yu Ling;Laura Nunes;Jonathan Srinivasan;Nasir Popalzay;Palmer Wilson;Jameson Quisenberry;Alex Borowicz","doi":"10.1109/LGRS.2025.3636286","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3636286","url":null,"abstract":"To be effective, ecosystem and habitat conservation must not only look at past losses but also understand the effects of current and future decisions on landscapes. Here, we present a transformative, user-driven land cover change prediction tool designed to aid land planners in strategic decision-making for conservation and habitat protection. Within an integrated map-based prediction pipeline, the tool uses machine learning (ML) and deep learning (DL) models to classify satellite images and make predictions of near-term land cover changes. The tool facilitates user interaction with a cloud-hosted ML model, making it accessible to nontechnical users for generating map-based predictions using big data. The tool’s key strength lies in its dynamic variable adjustment feature, empowering users to tailor scenarios related to potential future development planning. Through the integration of cloud-hosted ML and DL models with a user-centric interface, the tool has the potential to allow stakeholders and land planners to make informed decisions, actively minimizing habitat destruction and aligning with broader conservation objectives. We tested our approach in the context of central Texas, USA to evaluate its effectiveness in diverse conservation scenarios, with an average overall accuracy of 88% for the land cover class maps over four years and over 72% for the five-year land cover change prediction. While our approach has the potential to improve land management and planning for conservation, we also acknowledge the importance of rigorous model validation and ongoing refinement and highlight the need for technological advancement to be developed with strong stakeholder engagement.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11265796","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Remote sensing image segmentation is particularly difficult due to the coexistence of large-scale variations and fine-grained structures in very high-resolution imagery. Conventional CNN-based or transformer-based networks often struggle to capture global context while preserving boundary details, leading to degraded performance on small or thin objects. To address these challenges, we propose a self-prompt calibration network based on segment anything model 2 (SC-SAM). The SC-SAM achieves self-prompt by feeding mask prompts from a lightweight decoder into frozen prompt encoder. Output calibration is achieved through the proposed cross-probability guided calibration (CPGC) module, which employs cross-probability uncertainty as complementary guidance to refine final predictions via self-prompted outputs. Furthermore, to better preserve contextual and structural information across multiple scales, a scale-decoupled kernel mixture (SDKM) module is designed. Experimental results on the ISPRS Vaihingen and Potsdam dataset demonstrate that the proposed approach surpasses the state-of-the-art methods by 1.02% and 1.34% in mIoU, highlighting its effectiveness. This study provides new insights into adapting SAM for domain-specific remote sensing segmentation tasks.
{"title":"A Self-Prompt Calibration Network Based on Segment Anything Model 2 for High-Resolution Remote Sensing Image Segmentation","authors":"Yizhou Lan;Daoyuan Zheng;Xinge Zhao;Ke Shang;Feizhou Zhang","doi":"10.1109/LGRS.2025.3636177","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3636177","url":null,"abstract":"Remote sensing image segmentation is particularly difficult due to the coexistence of large-scale variations and fine-grained structures in very high-resolution imagery. Conventional CNN-based or transformer-based networks often struggle to capture global context while preserving boundary details, leading to degraded performance on small or thin objects. To address these challenges, we propose a self-prompt calibration network based on segment anything model 2 (SC-SAM). The SC-SAM achieves self-prompt by feeding mask prompts from a lightweight decoder into frozen prompt encoder. Output calibration is achieved through the proposed cross-probability guided calibration (CPGC) module, which employs cross-probability uncertainty as complementary guidance to refine final predictions via self-prompted outputs. Furthermore, to better preserve contextual and structural information across multiple scales, a scale-decoupled kernel mixture (SDKM) module is designed. Experimental results on the ISPRS Vaihingen and Potsdam dataset demonstrate that the proposed approach surpasses the state-of-the-art methods by 1.02% and 1.34% in mIoU, highlighting its effectiveness. This study provides new insights into adapting SAM for domain-specific remote sensing segmentation tasks.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1109/LGRS.2025.3636165
Nguyen Anh Tu;Nursultan Makhanov;Kenzhebek Taniyev;Ton Duc Do
Aerial video captioning (VC) facilitates the automatic interpretation of dynamic scenes in remote sensing (RS), supporting critical applications, such as disaster response, traffic monitoring, and environmental surveillance. However, challenges, such as extreme angles and continuous camera motion, require adaptive modeling of complex temporal relationships. To tackle these challenges, we leverage an image-language model as the vision encoder and introduce a temporal adaptation module that combines convolution with self-attention layers to both capture local semantics across neighboring frames and model global temporal dependencies. This design allows our model to exploit the multimodal knowledge of the vision encoder while effectively reasoning over the spatiotemporal dynamics. In addition, privacy concerns often restrict access to annotated aerial datasets, posing further challenges for model training. To address this, we develop a federated learning (FL) framework that enables collaborative model training across decentralized clients. Within this framework, we establish a unified benchmark for systematic comparison of temporal adapters, text decoders, and FL strategies, hence filling a gap in the existing literature. Extensive experiments validate the robustness of our approach and its potential for advancing aerial VC.
{"title":"Federated Aerial Video Captioning With Effective Temporal Adaptation","authors":"Nguyen Anh Tu;Nursultan Makhanov;Kenzhebek Taniyev;Ton Duc Do","doi":"10.1109/LGRS.2025.3636165","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3636165","url":null,"abstract":"Aerial video captioning (VC) facilitates the automatic interpretation of dynamic scenes in remote sensing (RS), supporting critical applications, such as disaster response, traffic monitoring, and environmental surveillance. However, challenges, such as extreme angles and continuous camera motion, require adaptive modeling of complex temporal relationships. To tackle these challenges, we leverage an image-language model as the vision encoder and introduce a temporal adaptation module that combines convolution with self-attention layers to both capture local semantics across neighboring frames and model global temporal dependencies. This design allows our model to exploit the multimodal knowledge of the vision encoder while effectively reasoning over the spatiotemporal dynamics. In addition, privacy concerns often restrict access to annotated aerial datasets, posing further challenges for model training. To address this, we develop a federated learning (FL) framework that enables collaborative model training across decentralized clients. Within this framework, we establish a unified benchmark for systematic comparison of temporal adapters, text decoders, and FL strategies, hence filling a gap in the existing literature. Extensive experiments validate the robustness of our approach and its potential for advancing aerial VC.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1109/LGRS.2025.3636236
Qi Zeng;Wanchun Zhang;Jie Cheng
This study develops an integrated framework for all-sky surface longwave downward radiation (SLDR) estimate for the medium resolution spectral imager-II (MERSI-II) onboard the Fengyun-3D (FY-3D) satellite. The framework comprises a hybrid method for the clear-sky SLDR estimate and a cloud base temperature (CBT)-based single-layer cloud model (SLCM) for the cloudy-sky SLDR estimate. In situ validation indicates that the hybrid method yields a bias/RMSE of −0.78/21.70 W/m2, whereas the SLCM achieves a bias/RMSE of 5.79/23.61 W/m2. The bias/RMSE of the all-sky SLDR is 3.37/22.93 W/m2. The estimated all-sky instantaneous SLDR was combined with ERA5 temporal information to derive daily SLDR using a bias-corrected sinusoidal integration method, yielding a bias of 0.04 W/m2 and an RMSE of 16.77 W/m2. These results demonstrate the robustness of the proposed framework and its substantial potential in generating both instantaneous and daily SLDR products at 1 km spatial resolution.
{"title":"An Integrated Framework for Estimating the All-Sky Surface Downward Longwave Radiation From FY-3D/MERSI-II Imagery","authors":"Qi Zeng;Wanchun Zhang;Jie Cheng","doi":"10.1109/LGRS.2025.3636236","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3636236","url":null,"abstract":"This study develops an integrated framework for all-sky surface longwave downward radiation (SLDR) estimate for the medium resolution spectral imager-II (MERSI-II) onboard the Fengyun-3D (FY-3D) satellite. The framework comprises a hybrid method for the clear-sky SLDR estimate and a cloud base temperature (CBT)-based single-layer cloud model (SLCM) for the cloudy-sky SLDR estimate. In situ validation indicates that the hybrid method yields a bias/RMSE of −0.78/21.70 W/m2, whereas the SLCM achieves a bias/RMSE of 5.79/23.61 W/m2. The bias/RMSE of the all-sky SLDR is 3.37/22.93 W/m2. The estimated all-sky instantaneous SLDR was combined with ERA5 temporal information to derive daily SLDR using a bias-corrected sinusoidal integration method, yielding a bias of 0.04 W/m2 and an RMSE of 16.77 W/m2. These results demonstrate the robustness of the proposed framework and its substantial potential in generating both instantaneous and daily SLDR products at 1 km spatial resolution.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}