ISPRS Journal of Photogrammetry and Remote Sensing最新文献_第3页

FengYun-3 meteorological satellites’ microwave radiation Imagers enhance land surface temperature measurements across the diurnal cycle

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-03-06 DOI: 10.1016/j.isprsjprs.2025.02.018

Yuyang Xiong , Tianjie Zhao , Haishen Lü , Zhiqing Peng , Jingyao Zheng , Yu Bai , Panpan Yao , Peng Guo , Peilin Song , Zushuai Wei , Ronghan Xu , Shengli Wu , Lixin Dong , Lin Chen , Na Xu , Xiuqing Hu , Peng Zhang , Letu Husi , Jiancheng Shi

Land Surface Temperature (LST) is a vital meteorological variable for assessing hydrological, ecological, and climatological dynamics, as well as energy exchanges at the land–atmosphere interface. Accurate and frequent LST measurement is essential for meteorological satellites. However, existing retrieval algorithms often fail to capture the nuances of diurnal temperature variations. This study utilizes the exceptional diurnal sampling capabilities of the Microwave Radiation Imagers (MWRI) on China’s FengYun-3 (FY-3) satellites to improve LST measurements throughout the day. The objective is to develop a global algorithm that can distinguish between frozen and thawed states of near-surface landscape. This algorithm integrates multi-channel brightness temperature data and an array of microwave indices to enhance accuracy across diverse land cover types. Validation against in-situ measurements, alongside the comparative analysis with ERA5 and MODIS LST products demonstrate the algorithm’s high robustness. Results reveal a correlation coefficient exceeding 0.87 between FY-3 MWRI-derived LST and 5-cm soil temperature, with a root mean squared error (RMSE) near 4 K, except at 14:00 for FY-3D. The theoretical uncertainty, estimated using triple collocation analysis of the three LST datasets from FY-3 MWRI, ERA5 and MODIS, is less than 4 K for the majority of the globe. Additionally, the FY-3 MWRI exhibits reduced diurnal variation in LST as compared to MODIS LST, the peak temperatures recorded by FY-3 MWRI occur with a certain time lag relative to MODIS, and the diurnal temperature range is generally narrower, showcasing its adeptness in delineating diurnal temperature cycles when deployed across the FY-3B/C/D satellite constellation.

{"title":"FengYun-3 meteorological satellites’ microwave radiation Imagers enhance land surface temperature measurements across the diurnal cycle","authors":"Yuyang Xiong , Tianjie Zhao , Haishen Lü , Zhiqing Peng , Jingyao Zheng , Yu Bai , Panpan Yao , Peng Guo , Peilin Song , Zushuai Wei , Ronghan Xu , Shengli Wu , Lixin Dong , Lin Chen , Na Xu , Xiuqing Hu , Peng Zhang , Letu Husi , Jiancheng Shi","doi":"10.1016/j.isprsjprs.2025.02.018","DOIUrl":"10.1016/j.isprsjprs.2025.02.018","url":null,"abstract":"<div><div>Land Surface Temperature (LST) is a vital meteorological variable for assessing hydrological, ecological, and climatological dynamics, as well as energy exchanges at the land–atmosphere interface. Accurate and frequent LST measurement is essential for meteorological satellites. However, existing retrieval algorithms often fail to capture the nuances of diurnal temperature variations. This study utilizes the exceptional diurnal sampling capabilities of the Microwave Radiation Imagers (MWRI) on China’s FengYun-3 (FY-3) satellites to improve LST measurements throughout the day. The objective is to develop a global algorithm that can distinguish between frozen and thawed states of near-surface landscape. This algorithm integrates multi-channel brightness temperature data and an array of microwave indices to enhance accuracy across diverse land cover types. Validation against in-situ measurements, alongside the comparative analysis with ERA5 and MODIS LST products demonstrate the algorithm’s high robustness. Results reveal a correlation coefficient exceeding 0.87 between FY-3 MWRI-derived LST and 5-cm soil temperature, with a root mean squared error (RMSE) near 4 K, except at 14:00 for FY-3D. The theoretical uncertainty, estimated using triple collocation analysis of the three LST datasets from FY-3 MWRI, ERA5 and MODIS, is less than 4 K for the majority of the globe. Additionally, the FY-3 MWRI exhibits reduced diurnal variation in LST as compared to MODIS LST, the peak temperatures recorded by FY-3 MWRI occur with a certain time lag relative to MODIS, and the diurnal temperature range is generally narrower, showcasing its adeptness in delineating diurnal temperature cycles when deployed across the FY-3B/C/D satellite constellation.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 204-224"},"PeriodicalIF":10.6,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mitigation of tropospheric turbulent delays in InSAR time series by incorporating a stochastic process

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-03-05 DOI: 10.1016/j.isprsjprs.2025.02.028

Hailu Chen , Yunzhong Shen , Lei Zhang , Hongyu Liang , Tengfei Feng , Xinyou Song

Tropospheric delays present a significant challenge to accurately mapping the Earth’s surface movements using interferometric synthetic aperture radar (InSAR). These delays are typically divided into stratified and turbulent components. While efforts have been made to address the stratified component, effectively mitigating turbulence remains an ongoing challenge. In response, this study proposes a joint model that compasses both the deterministic components and stochastic elements to account for the phases raised by turbulent delays in full InSAR time series. In the joint model, the deformation phases are parameterized by time-domain polynomial, while the turbulent delays are treated as spatially correlated stochastic variables, defined by spatial variance–covariance functions. Least Squares Collocation (LSC) and Variance-Covariance Estimation (VCE) are employed to solve this joint model, enabling simultaneous estimation of modelled deformation and turbulent mixing from full InSAR time series. The rationale is rooted in the distinct temporal dependencies of deformation and turbulent delay. Its efficacy and versatility are demonstrated using simulated and Sentinel-1 data from Hong Kong International Airport (China) and the Southern Valley of California (USA). In simulations, the root mean square error (RMSE) of the differential delays decreased from 2.4 to 0.8 cm. In the Southern Valley, comparison with 70 GPS measurements showed a 73.7 % reduction in mean RMSE, from 1.9 to 0.5 cm. These results confirm the effectiveness of this approach in mitigating tropospheric turbulence delays in the time domain.

{"title":"Mitigation of tropospheric turbulent delays in InSAR time series by incorporating a stochastic process","authors":"Hailu Chen , Yunzhong Shen , Lei Zhang , Hongyu Liang , Tengfei Feng , Xinyou Song","doi":"10.1016/j.isprsjprs.2025.02.028","DOIUrl":"10.1016/j.isprsjprs.2025.02.028","url":null,"abstract":"<div><div>Tropospheric delays present a significant challenge to accurately mapping the Earth’s surface movements using interferometric synthetic aperture radar (InSAR). These delays are typically divided into stratified and turbulent components. While efforts have been made to address the stratified component, effectively mitigating turbulence remains an ongoing challenge. In response, this study proposes a joint model that compasses both the deterministic components and stochastic elements to account for the phases raised by turbulent delays in full InSAR time series. In the joint model, the deformation phases are parameterized by time-domain polynomial, while the turbulent delays are treated as spatially correlated stochastic variables, defined by spatial variance–covariance functions. Least Squares Collocation (LSC) and Variance-Covariance Estimation (VCE) are employed to solve this joint model, enabling simultaneous estimation of modelled deformation and turbulent mixing from full InSAR time series. The rationale is rooted in the distinct temporal dependencies of deformation and turbulent delay. Its efficacy and versatility are demonstrated using simulated and Sentinel-1 data from Hong Kong International Airport (China) and the Southern Valley of California (USA). In simulations, the root mean square error (RMSE) of the differential delays decreased from 2.4 to 0.8 cm. In the Southern Valley, comparison with 70 GPS measurements showed a 73.7 % reduction in mean RMSE, from 1.9 to 0.5 cm. These results confirm the effectiveness of this approach in mitigating tropospheric turbulence delays in the time domain.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 186-203"},"PeriodicalIF":10.6,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TACMT: Text-aware cross-modal transformer for visual grounding on high-resolution SAR images

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-03-02 DOI: 10.1016/j.isprsjprs.2025.02.022

Tianyang Li , Chao Wang , Sirui Tian , Bo Zhang , Fan Wu , Yixian Tang , Hong Zhang

This paper introduces a novel task of visual grounding for high-resolution synthetic aperture radar images (SARVG). SARVG aims to identify the referred object in images through natural language instructions. While object detection on SAR images has been extensively investigated, identifying objects based on natural language remains under-explored. Due to the unique satellite view and side-look geometry, substantial expertise is often required to interpret objects, making it challenging to generalize across different sensors. Therefore, we propose to construct a dataset and develop multimodal deep learning models for the SARVG task. Our contributions can be summarized as follows. Using power transmission tower detection as an example, we have built a new benchmark of SARVG based on images from different SAR sensors to fully promote SARVG research. Subsequently, a novel text-aware cross-modal Transformer (TACMT) is proposed which follows DETR’s architecture. We develop a cross-modal encoder to enhance the visual features associated with the textual descriptions. Next, a text-aware query selection module is devised to select relevant context features as the decoder query. To retrieve the object from various scenes, we further design a cross-scale fusion module to fuse features from different levels for accurate target localization. Finally, extensive experiments on our dataset and widely used public datasets have demonstrated the effectiveness of our proposed model. This work provides valuable insights for SAR image interpretation. The code and dataset are available at https://github.com/CAESAR-Radi/TACMT.

{"title":"TACMT: Text-aware cross-modal transformer for visual grounding on high-resolution SAR images","authors":"Tianyang Li , Chao Wang , Sirui Tian , Bo Zhang , Fan Wu , Yixian Tang , Hong Zhang","doi":"10.1016/j.isprsjprs.2025.02.022","DOIUrl":"10.1016/j.isprsjprs.2025.02.022","url":null,"abstract":"<div><div>This paper introduces a novel task of visual grounding for high-resolution synthetic aperture radar images (SARVG). SARVG aims to identify the referred object in images through natural language instructions. While object detection on SAR images has been extensively investigated, identifying objects based on natural language remains under-explored. Due to the unique satellite view and side-look geometry, substantial expertise is often required to interpret objects, making it challenging to generalize across different sensors. Therefore, we propose to construct a dataset and develop multimodal deep learning models for the SARVG task. Our contributions can be summarized as follows. Using power transmission tower detection as an example, we have built a new benchmark of SARVG based on images from different SAR sensors to fully promote SARVG research. Subsequently, a novel text-aware cross-modal Transformer (TACMT) is proposed which follows DETR’s architecture. We develop a cross-modal encoder to enhance the visual features associated with the textual descriptions. Next, a text-aware query selection module is devised to select relevant context features as the decoder query. To retrieve the object from various scenes, we further design a cross-scale fusion module to fuse features from different levels for accurate target localization. Finally, extensive experiments on our dataset and widely used public datasets have demonstrated the effectiveness of our proposed model. This work provides valuable insights for SAR image interpretation. The code and dataset are available at <span><span>https://github.com/CAESAR-Radi/TACMT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 152-166"},"PeriodicalIF":10.6,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143526819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Time-Series models for ground subsidence and heave over permafrost in InSAR Processing: A comprehensive assessment and new improvement InSAR 处理中冻土层地面沉降和隆起的时间序列模型：全面评估和新的改进

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-03-02 DOI: 10.1016/j.isprsjprs.2025.02.019

Chengyan Fan , Cuicui Mu , Lin Liu , Tingjun Zhang , Shichao Jia , Shengdi Wang , Wen Sun , Zhuoyi Zhao

InSAR is an effective tool for indirectly monitoring large-scale hydrological-thermal dynamics of the active layer and permafrost by detecting the surface deformation. However, the conventional time-series models of InSAR technology do not consider the distinctive and pronounced seasonal characteristics of deformation over permafrost. Although permafrost-tailored models have been developed, their performance relative to the conventional models has not been assessed. In this study, we modify sinusoidal function and Stefan-equation-based models (permafrost-tailored) to better characterize surface deformation over permafrost, and assess advantages and limitations of these models for three application scenarios: filling time-series gaps for Small Baseline Subset (SBAS) inversion, deriving velocity and amplitude of deformation and selecting reference points automatically. The HyP3 interferograms generated from Sentinel-1 are utilized to analyze the surface deformation of the permafrost region over the upper reaches of the Heihe River Basin from 2017 to 2023. The result shows that adding a semi-annual component to the sinusoidal function can better capture the characteristics of ground surface deformation in permafrost regions. The modified Stefan-equation-based model performs well in those application scenarios, but it is only recommended for complex scenarios that conventional mathematical models cannot handle or for detailed simulations at individual points due to sophisticated data preparation and high computational cost. Furthermore, we find reference points can introduce substantial uncertainties into the deformation velocity and amplitude measurements, in comparison to the uncertainties derived from interferograms alone. The analysis of deformation amplitude and inter-annual velocity reveals that an ice-rich permafrost region, exhibiting a seasonal amplitude of 50–130 mm, is experiencing rapid degradation characterized by a subsidence velocity ranging from −10 to −20 mm/yr. Our study gives a permafrost-tailored modification and quantitative assessment on the InSAR time-series models. It can also serve as a reference and promotion for the application of InSAR technology in future permafrost research. The dataset and code are available at https://github.com/Fanchengyan/FanInSAR.

{"title":"Time-Series models for ground subsidence and heave over permafrost in InSAR Processing: A comprehensive assessment and new improvement","authors":"Chengyan Fan , Cuicui Mu , Lin Liu , Tingjun Zhang , Shichao Jia , Shengdi Wang , Wen Sun , Zhuoyi Zhao","doi":"10.1016/j.isprsjprs.2025.02.019","DOIUrl":"10.1016/j.isprsjprs.2025.02.019","url":null,"abstract":"<div><div>InSAR is an effective tool for indirectly monitoring large-scale hydrological-thermal dynamics of the active layer and permafrost by detecting the surface deformation. However, the conventional time-series models of InSAR technology do not consider the distinctive and pronounced seasonal characteristics of deformation over permafrost. Although permafrost-tailored models have been developed, their performance relative to the conventional models has not been assessed. In this study, we modify sinusoidal function and Stefan-equation-based models (permafrost-tailored) to better characterize surface deformation over permafrost, and assess advantages and limitations of these models for three application scenarios: filling time-series gaps for Small Baseline Subset (SBAS) inversion, deriving velocity and amplitude of deformation and selecting reference points automatically. The HyP3 interferograms generated from Sentinel-1 are utilized to analyze the surface deformation of the permafrost region over the upper reaches of the Heihe River Basin from 2017 to 2023. The result shows that adding a semi-annual component to the sinusoidal function can better capture the characteristics of ground surface deformation in permafrost regions. The modified Stefan-equation-based model performs well in those application scenarios, but it is only recommended for complex scenarios that conventional mathematical models cannot handle or for detailed simulations at individual points due to sophisticated data preparation and high computational cost. Furthermore, we find reference points can introduce substantial uncertainties into the deformation velocity and amplitude measurements, in comparison to the uncertainties derived from interferograms alone. The analysis of deformation amplitude and inter-annual velocity reveals that an ice-rich permafrost region, exhibiting a seasonal amplitude of 50–130 mm, is experiencing rapid degradation characterized by a subsidence velocity ranging from −10 to −20 mm/yr. Our study gives a permafrost-tailored modification and quantitative assessment on the InSAR time-series models. It can also serve as a reference and promotion for the application of InSAR technology in future permafrost research. The dataset and code are available at <span><span>https://github.com/Fanchengyan/FanInSAR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 167-185"},"PeriodicalIF":10.6,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143526820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bounding box versus point annotation: The impact on deep learning performance for animal detection in aerial images

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-27 DOI: 10.1016/j.isprsjprs.2025.02.017

Zeyu Xu , Tiejun Wang , Andrew K. Skidmore , Richard Lamprey , Shadrack Ngene

Bounding box and point annotations are widely used in deep learning-based animal detection from remote sensing imagery, yet their impact on model performance and training efficiency remains insufficiently explored. This study systematically evaluates the influence of these two annotation methods using aerial survey datasets of African elephants and antelopes across three commonly employed deep learning networks: YOLO, CenterNet, and U-Net. In addition, we assess the effect of image spatial resolution and the training efficiency associated with each annotation method. Our findings indicate that when using YOLO, there is no statistically significant difference in model accuracy between bounding box and point annotations. However, for CenterNet and U-Net, bounding box annotations consistently yield significantly higher accuracy compared to point-based annotations, with these trends remaining consistent across different spatial resolution ranges. Furthermore, training efficiency varies depending on the network and annotation method. While YOLO exhibits similar convergence speeds for both annotation types, U-Net models trained with bounding box annotations converge significantly faster, followed by CenterNet, where bounding box-based models also show improved convergence. These findings demonstrate that the choice of annotation method should be guided by the specific deep learning architecture employed. While point-based annotations are more cost-effective, their lower training efficiency in U-Net and CenterNet suggests that bounding box annotations are preferable when maximizing both accuracy and computational efficiency. Therefore, when selecting annotation strategies for animal detection in remote sensing applications, researchers should carefully balance detection accuracy, annotation cost, and training efficiency to optimize performance for specific task requirements.

{"title":"Bounding box versus point annotation: The impact on deep learning performance for animal detection in aerial images","authors":"Zeyu Xu , Tiejun Wang , Andrew K. Skidmore , Richard Lamprey , Shadrack Ngene","doi":"10.1016/j.isprsjprs.2025.02.017","DOIUrl":"10.1016/j.isprsjprs.2025.02.017","url":null,"abstract":"<div><div>Bounding box and point annotations are widely used in deep learning-based animal detection from remote sensing imagery, yet their impact on model performance and training efficiency remains insufficiently explored. This study systematically evaluates the influence of these two annotation methods using aerial survey datasets of African elephants and antelopes across three commonly employed deep learning networks: YOLO, CenterNet, and U-Net. In addition, we assess the effect of image spatial resolution and the training efficiency associated with each annotation method. Our findings indicate that when using YOLO, there is no statistically significant difference in model accuracy between bounding box and point annotations. However, for CenterNet and U-Net, bounding box annotations consistently yield significantly higher accuracy compared to point-based annotations, with these trends remaining consistent across different spatial resolution ranges. Furthermore, training efficiency varies depending on the network and annotation method. While YOLO exhibits similar convergence speeds for both annotation types, U-Net models trained with bounding box annotations converge significantly faster, followed by CenterNet, where bounding box-based models also show improved convergence. These findings demonstrate that the choice of annotation method should be guided by the specific deep learning architecture employed. While point-based annotations are more cost-effective, their lower training efficiency in U-Net and CenterNet suggests that bounding box annotations are preferable when maximizing both accuracy and computational efficiency. Therefore, when selecting annotation strategies for animal detection in remote sensing applications, researchers should carefully balance detection accuracy, annotation cost, and training efficiency to optimize performance for specific task requirements.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 99-111"},"PeriodicalIF":10.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing LiDAR point cloud generation with BRDF-based appearance modelling

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-27 DOI: 10.1016/j.isprsjprs.2025.02.010

Alfonso López, Carlos J. Ogayar, Rafael J. Segura, Juan C. Casas-Rosa

This work presents an approach to generating LiDAR point clouds with empirical intensity data on a massively parallel scale. Our primary aim is to complement existing real-world LiDAR datasets by simulating a wide spectrum of attributes, ensuring our generated data can be directly compared to real point clouds. However, our emphasis lies in intensity data, which conventionally has been generated using non-photorealistic shading functions. In contrast, we represent surfaces with Bidirectional Reflectance Distribution Functions (BRDF) obtained through goniophotometer measurements. We also incorporate refractivity indices derived from prior research. Beyond this, we simulate other attributes commonly found in LiDAR datasets, including RGB values, normal vectors, GPS timestamps, semantic labels, instance IDs, and return data. Our simulations extend beyond terrestrial scenarios; we encompass mobile and aerial scans as well. Our results demonstrate the efficiency of our solution compared to other state-of-the-art simulators, achieving an average decrease in simulation time of 85.62%. Notably, our approach introduces greater variability in the generated intensity data, accounting for material properties and variations caused by the incident and viewing vectors. The source code is available on GitHub (https://github.com/AlfonsoLRz/LiDAR_BRDF).

{"title":"Enhancing LiDAR point cloud generation with BRDF-based appearance modelling","authors":"Alfonso López, Carlos J. Ogayar, Rafael J. Segura, Juan C. Casas-Rosa","doi":"10.1016/j.isprsjprs.2025.02.010","DOIUrl":"10.1016/j.isprsjprs.2025.02.010","url":null,"abstract":"<div><div>This work presents an approach to generating LiDAR point clouds with empirical intensity data on a massively parallel scale. Our primary aim is to complement existing real-world LiDAR datasets by simulating a wide spectrum of attributes, ensuring our generated data can be directly compared to real point clouds. However, our emphasis lies in intensity data, which conventionally has been generated using non-photorealistic shading functions. In contrast, we represent surfaces with Bidirectional Reflectance Distribution Functions (BRDF) obtained through goniophotometer measurements. We also incorporate refractivity indices derived from prior research. Beyond this, we simulate other attributes commonly found in LiDAR datasets, including RGB values, normal vectors, GPS timestamps, semantic labels, instance IDs, and return data. Our simulations extend beyond terrestrial scenarios; we encompass mobile and aerial scans as well. Our results demonstrate the efficiency of our solution compared to other state-of-the-art simulators, achieving an average decrease in simulation time of 85.62%. Notably, our approach introduces greater variability in the generated intensity data, accounting for material properties and variations caused by the incident and viewing vectors. The source code is available on GitHub (<span><span>https://github.com/AlfonsoLRz/LiDAR_BRDF</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 79-98"},"PeriodicalIF":10.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LuoJiaHOG: A hierarchy oriented geo-aware image caption dataset for remote sensing image–text retrieval

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-27 DOI: 10.1016/j.isprsjprs.2025.02.009

Yuanxin Zhao , Mi Zhang , Bingnan Yang , Zhan Zhang , Jujia Kang , Jianya Gong

Image–text retrieval (ITR) is crucial for making informed decisions in various remote sensing (RS) applications, including urban development and disaster prevention. However, creating ITR datasets that combine vision and language modalities requires extensive geo-spatial sampling, diverse categories, and detailed descriptions. To address these needs, we introduce the LuojiaHOG dataset, which is geospatially aware, label-extension-friendly, and features comprehensive captions. LuojiaHOG incorporates hierarchical spatial sampling, an extensible classification system aligned with Open Geospatial Consortium (OGC) standards, and detailed caption generation. Additionally, we propose a CLIP-based Image Semantic Enhancement Network (CISEN) to enhance sophisticated ITR capabilities. CISEN comprises dual-path knowledge transfer and progressive cross-modal feature fusion. The former transfers multimodal knowledge from a large, pretrained CLIP-like model, while the latter enhances visual-to-text alignment and fine-grained cross-modal feature integration. Comprehensive statistics on LuojiaHOG demonstrate its richness in sampling diversity, label quantity, and description granularity. Evaluations of LuojiaHOG using various state-of-the-art ITR models–including ALBEF, ALIGN, CLIP, FILIP, Wukong, GeoRSCLIP, and CISEN-employ second- and third-level labels. Adapter-tuning shows that CISEN outperforms others, achieving the highest scores with WMAP@5 rates of 88.47% and 87.28% on third-level ITR tasks, respectively. Moreover, CISEN shows improvements of approximately 1.3% and 0.9% in WMAP@5 over its baseline. When tested on previous RS ITR benchmarks, CISEN achieves performance close to the state-of-the-art methods. Pretraining on LuojiaHOG can further enhance retrieval results. These findings underscore the advancements of CISEN in accurately retrieving relevant information across images and texts. LuojiaHOG and CISEN can serve as foundational resources for future research on RS image–text alignment, supporting a broad spectrum of vision-language applications. The retrieval demo and dataset are available at:https://huggingface.co/spaces/aleo1/LuojiaHOG-demo.

{"title":"LuoJiaHOG: A hierarchy oriented geo-aware image caption dataset for remote sensing image–text retrieval","authors":"Yuanxin Zhao , Mi Zhang , Bingnan Yang , Zhan Zhang , Jujia Kang , Jianya Gong","doi":"10.1016/j.isprsjprs.2025.02.009","DOIUrl":"10.1016/j.isprsjprs.2025.02.009","url":null,"abstract":"<div><div>Image–text retrieval (ITR) is crucial for making informed decisions in various remote sensing (RS) applications, including urban development and disaster prevention. However, creating ITR datasets that combine vision and language modalities requires extensive geo-spatial sampling, diverse categories, and detailed descriptions. To address these needs, we introduce the LuojiaHOG dataset, which is geospatially aware, label-extension-friendly, and features comprehensive captions. LuojiaHOG incorporates hierarchical spatial sampling, an extensible classification system aligned with Open Geospatial Consortium (OGC) standards, and detailed caption generation. Additionally, we propose a CLIP-based Image Semantic Enhancement Network (CISEN) to enhance sophisticated ITR capabilities. CISEN comprises dual-path knowledge transfer and progressive cross-modal feature fusion. The former transfers multimodal knowledge from a large, pretrained CLIP-like model, while the latter enhances visual-to-text alignment and fine-grained cross-modal feature integration. Comprehensive statistics on LuojiaHOG demonstrate its richness in sampling diversity, label quantity, and description granularity. Evaluations of LuojiaHOG using various state-of-the-art ITR models–including ALBEF, ALIGN, CLIP, FILIP, Wukong, GeoRSCLIP, and CISEN-employ second- and third-level labels. Adapter-tuning shows that CISEN outperforms others, achieving the highest scores with WMAP@5 rates of 88.47% and 87.28% on third-level ITR tasks, respectively. Moreover, CISEN shows improvements of approximately 1.3% and 0.9% in WMAP@5 over its baseline. When tested on previous RS ITR benchmarks, CISEN achieves performance close to the state-of-the-art methods. Pretraining on LuojiaHOG can further enhance retrieval results. These findings underscore the advancements of CISEN in accurately retrieving relevant information across images and texts. LuojiaHOG and CISEN can serve as foundational resources for future research on RS image–text alignment, supporting a broad spectrum of vision-language applications. The retrieval demo and dataset are available at:<span><span>https://huggingface.co/spaces/aleo1/LuojiaHOG-demo</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 130-151"},"PeriodicalIF":10.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Photogrammetric system of non-central refractive camera based on two-view 3D reconstruction

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-27 DOI: 10.1016/j.isprsjprs.2025.02.016

Zhen Wu , Mingshu Nan , Haidong Zhang , Junzhou Huo , Shangqi Chen , Guanyu Chen , Zhang Cheng

Due to the harsh construction environment of tunnels, the visual system must be fitted with a sphere cover of a certain thickness. The visual system with an optical sphere cover invalidates conventional measurement methods. Therefore, this paper provides a comprehensive visual measurement method using spherical glass refraction. First, the spherical glass refraction imaging is modeled using a geometry-driven camera model. Second, a three-parameter calibration method for the optical center offset unit vector, incident optical path offset distance, and optical center offset distance was proposed to accurately characterize refractive distortion. Then, a dynamic interval (DI) based on angle and depth constraints is introduced, and a DI-SGM algorithm utilizing the semi-global stereo matching method is developed to solve the polar constraint failure problem under refraction. Finally, an improved binocular parallax method that uses refraction image pairs is proposed and demonstrated to be effective and stable under spherical refraction using effectiveness and comprehensive data experiments. The constructed DI has narrow characteristics. The imaging model presented in this paper has an average space reconstruction error of only 0.087 mm. The maximum measurement error for sphere center distance is only 0.157 mm, which is comparable in accuracy to the case with no refraction. The proposed method provides an effective approach for applying visual measurement methods under refractive effects, thereby improving the visual system’s reliability in tunnel environments.

{"title":"Photogrammetric system of non-central refractive camera based on two-view 3D reconstruction","authors":"Zhen Wu , Mingshu Nan , Haidong Zhang , Junzhou Huo , Shangqi Chen , Guanyu Chen , Zhang Cheng","doi":"10.1016/j.isprsjprs.2025.02.016","DOIUrl":"10.1016/j.isprsjprs.2025.02.016","url":null,"abstract":"<div><div>Due to the harsh construction environment of tunnels, the visual system must be fitted with a sphere cover of a certain thickness. The visual system with an optical sphere cover invalidates conventional measurement methods. Therefore, this paper provides a comprehensive visual measurement method using spherical glass refraction. First, the spherical glass refraction imaging is modeled using a geometry-driven camera model. Second, a three-parameter calibration method for the optical center offset unit vector, incident optical path offset distance, and optical center offset distance was proposed to accurately characterize refractive distortion. Then, a dynamic interval (DI) based on angle and depth constraints is introduced, and a DI-SGM algorithm utilizing the semi-global stereo matching method is developed to solve the polar constraint failure problem under refraction. Finally, an improved binocular parallax method that uses refraction image pairs is proposed and demonstrated to be effective and stable under spherical refraction using effectiveness and comprehensive data experiments. The constructed DI has narrow characteristics. The imaging model presented in this paper has an average space reconstruction error of only 0.087 mm. The maximum measurement error for sphere center distance is only 0.157 mm, which is comparable in accuracy to the case with no refraction. The proposed method provides an effective approach for applying visual measurement methods under refractive effects, thereby improving the visual system’s reliability in tunnel environments.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 112-129"},"PeriodicalIF":10.6,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Discrepancy Masked Distillation for remote sensing object detection

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-26 DOI: 10.1016/j.isprsjprs.2025.02.006

Cong Li, Gong Cheng, Junwei Han

Knowledge distillation (KD) has become a promising technique for obtaining a performant student detector in remote sensing images by inheriting the knowledge from a heavy teacher detector. Unfortunately, not every pixel contributes (even detrimental) equally to the final KD performance. To dispel this problem, the existing methods usually derived a distillation mask to stress the valuable regions during KD. In this paper, we put forth Adaptive Discrepancy Masked Distillation (ADMD), a novel KD framework to explicitly localize the beneficial pixels. Our approach stems from the observation that the feature discrepancy between the teacher and student is the essential reason for their performance gap. With this regard, we make use of the feature discrepancy to determine which location causes the student to lag behind the teacher and then regulate the student to assign higher learning priority to them. Furthermore, we empirically observe that the discrepancy masked distillation leads to loss vanishing in later KD stages. To combat this issue, we introduce a simple yet practical weight-increasing module, in which the magnitude of KD loss is adaptively adjusted to ensure KD steadily contributes to student optimization. Comprehensive experiments on DIOR and DOTA across various dense detectors show that our ADMD consistently harvests remarkable performance gains, particularly under a prolonged distillation schedule, and exhibits superiority over state-of-the-art counterparts. Code and trained checkpoints will be made available at https://github.com/swift1988.

知识蒸馏（KD）是一种很有前途的技术，它通过继承大量教师检测器的知识，在遥感图像中获得性能良好的学生检测器。遗憾的是，并非每个像素都能对最终的 KD 性能做出同样的贡献（甚至是有害的）。为了解决这个问题，现有的方法通常会在 KD 过程中衍生出一个蒸馏掩码来强调有价值的区域。在本文中，我们提出了一种新颖的 KD 框架--自适应差异屏蔽蒸馏（ADMD），以明确定位有益像素。我们的方法源于我们的观察，即教师和学生之间的特征差异是导致他们成绩差距的根本原因。因此，我们利用特征差异来确定哪个位置导致学生落后于教师，然后对学生进行调节，为其分配更高的学习优先级。此外，我们还通过实证观察发现，差异掩蔽提炼会导致后期 KD 阶段的损失消失。为了解决这个问题，我们引入了一个简单而实用的权重增加模块，在这个模块中，KD 损失的大小会进行自适应调整，以确保 KD 稳步促进学生的优化。在 DIOR 和 DOTA 上对各种密集检测器进行的综合实验表明，我们的 ADMD 持续获得了显著的性能提升，尤其是在延长蒸馏时间的情况下，并显示出优于最先进同行的性能。代码和训练有素的检查点将发布在 https://github.com/swift1988 上。

{"title":"Adaptive Discrepancy Masked Distillation for remote sensing object detection","authors":"Cong Li, Gong Cheng, Junwei Han","doi":"10.1016/j.isprsjprs.2025.02.006","DOIUrl":"10.1016/j.isprsjprs.2025.02.006","url":null,"abstract":"<div><div>Knowledge distillation (KD) has become a promising technique for obtaining a performant student detector in remote sensing images by inheriting the knowledge from a heavy teacher detector. Unfortunately, not every pixel contributes (even detrimental) equally to the final KD performance. To dispel this problem, the existing methods usually derived a distillation mask to stress the valuable regions during KD. In this paper, we put forth Adaptive Discrepancy Masked Distillation (ADMD), a novel KD framework to explicitly localize the beneficial pixels. Our approach stems from the observation that the feature discrepancy between the teacher and student is the essential reason for their performance gap. With this regard, we make use of the feature discrepancy to determine which location causes the student to lag behind the teacher and then regulate the student to assign higher learning priority to them. Furthermore, we empirically observe that the discrepancy masked distillation leads to loss vanishing in later KD stages. To combat this issue, we introduce a simple yet practical weight-increasing module, in which the magnitude of KD loss is adaptively adjusted to ensure KD steadily contributes to student optimization. Comprehensive experiments on DIOR and DOTA across various dense detectors show that our ADMD consistently harvests remarkable performance gains, particularly under a prolonged distillation schedule, and exhibits superiority over state-of-the-art counterparts. Code and trained checkpoints will be made available at <span><span>https://github.com/swift1988</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 54-63"},"PeriodicalIF":10.6,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling the satellite instrument visibility range for detecting underwater targets

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-26 DOI: 10.1016/j.isprsjprs.2025.02.013

Jun Chen , Wenting Quan , Xianqiang He , Ming Xu , Caipin Li , Delu Pan

To assess the ability of a satellite instrument to detect submerged targets, we constructed a semi-analytical relationship to link target reflectance and the contrast threshold of the satellite instrument to visibility ranges. Using numerical simulation, we found that the contrast threshold of the satellite instrument was equal to 50 % of the residual error contained in satellite R_rs data. We evaluated our model using known sea depths of optically shallow water and found that the model produced ∼ 16 % uncertainty in retrieving the visibility range around the edge of the optically shallow water. By comparison, the contrast threshold of the human eye was more than 20 times larger than the satellite instrument contrast threshold. In addition, using a Secchi disk submerged in the shallow water, we found that the Secchi disk was invisible to the human eye when the disk was still visible to a high-quality camera handheld or mounted on an unmanned aerial vehicle. Moreover, when the image data quality was as well as MODIS instrument, we found that the maximum instrument visibility range reached 130 m in theory, which was approximately four times larger than the maximum reached by the human eye. Our findings suggest that high-quality cameras such as satellite instruments are more effective than the human eye for detecting underwater targets.

{"title":"Modeling the satellite instrument visibility range for detecting underwater targets","authors":"Jun Chen , Wenting Quan , Xianqiang He , Ming Xu , Caipin Li , Delu Pan","doi":"10.1016/j.isprsjprs.2025.02.013","DOIUrl":"10.1016/j.isprsjprs.2025.02.013","url":null,"abstract":"<div><div>To assess the ability of a satellite instrument to detect submerged targets, we constructed a semi-analytical relationship to link target reflectance and the contrast threshold of the satellite instrument to visibility ranges. Using numerical simulation, we found that the contrast threshold of the satellite instrument was equal to 50 % of the residual error contained in satellite <em>R</em><sub>rs</sub> data. We evaluated our model using known sea depths of optically shallow water and found that the model produced ∼ 16 % uncertainty in retrieving the visibility range around the edge of the optically shallow water. By comparison, the contrast threshold of the human eye was more than 20 times larger than the satellite instrument contrast threshold. In addition, using a Secchi disk submerged in the shallow water, we found that the Secchi disk was invisible to the human eye when the disk was still visible to a high-quality camera handheld or mounted on an unmanned aerial vehicle. Moreover, when the image data quality was as well as MODIS instrument, we found that the maximum instrument visibility range reached 130 m in theory, which was approximately four times larger than the maximum reached by the human eye. Our findings suggest that high-quality cameras such as satellite instruments are more effective than the human eye for detecting underwater targets.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"222 ","pages":"Pages 64-78"},"PeriodicalIF":10.6,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0