Pub Date : 2026-03-01Epub Date: 2026-02-04DOI: 10.1016/j.isprsjprs.2026.01.042
Liujun Zhu , Yaqian Li , Shanshui Yuan , Shi Shi , Fang Ji
Tropical cyclones (TCs) are among the most destructive natural hazards, frequently causing widespread power outages (POs) in coastal urban areas that disrupt economic activity and social stability. Quantifying TC-induced POs remains challenging due to limited outage data availability. This study made the first global detection and quantification of TC-induced POs using NASA’s Black Marble nighttime lights (NTL) data. The proposed method analyzed changes in NTL brightness within urban agglomerations by establishing pre-TC baselines and applying statistical outlier detection to identify outages. A total of 1,239 POs were detected by the algorithm from 19,999 agglomeration-TC events between 2012 and 2023, with the corresponding outage duration and severity being also estimated. Validation against media reports showed an overall accuracy of 0.78, with the accuracy being improved with TC intensity. Case studies demonstrated robust performance in regions with vulnerable infrastructure and high-quality NTL observations, such as North America, while performance declined in areas affected by frequent data gaps or rapid restoration, notably East Asia and India. While only 50% of agglomeration–TC events can be evaluated due to the missing NTL data, this work offers a scalable, near-real-time approach to global TC-induced PO monitoring, providing critical insights for urban resilience planning, disaster response, and power system management.
{"title":"Monitoring global power outages induced by tropical cyclones using nighttime light data","authors":"Liujun Zhu , Yaqian Li , Shanshui Yuan , Shi Shi , Fang Ji","doi":"10.1016/j.isprsjprs.2026.01.042","DOIUrl":"10.1016/j.isprsjprs.2026.01.042","url":null,"abstract":"<div><div>Tropical cyclones (TCs) are among the most destructive natural hazards, frequently causing widespread power outages (POs) in coastal urban areas that disrupt economic activity and social stability. Quantifying TC-induced POs remains challenging due to limited outage data availability. This study made the first global detection and quantification of TC-induced POs using NASA’s Black Marble nighttime lights (NTL) data. The proposed method analyzed changes in NTL brightness within urban agglomerations by establishing pre-TC baselines and applying statistical outlier detection to identify outages. A total of 1,239 POs were detected by the algorithm from 19,999 agglomeration-TC events between 2012 and 2023, with the corresponding outage duration and severity being also estimated. Validation against media reports showed an overall accuracy of 0.78, with the accuracy being improved with TC intensity. Case studies demonstrated robust performance in regions with vulnerable infrastructure and high-quality NTL observations, such as North America, while performance declined in areas affected by frequent data gaps or rapid restoration, notably East Asia and India. While only 50% of agglomeration–TC events can be evaluated due to the missing NTL data, this work offers a scalable, near-real-time approach to global TC-induced PO monitoring, providing critical insights for urban resilience planning, disaster response, and power system management.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 437-451"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-20DOI: 10.1016/j.isprsjprs.2026.01.023
Ruqin Zhou , Chenguang Dai , Wanshou Jiang , Yongsheng Zhang , Zhenchao Zhang , San Jiang
Vectorized high-definition (HD) map construction is formulated as the task of classifying and localizing typical map elements based on features in a bird’s-eye view (BEV). This is essential for autonomous driving systems, providing interpretable environmental structured representations for decision and planning. Remarkable work has been achieved in recent years, but several major issues remain: (1) in the generation of the BEV features, single modality methods suffer from limited perception capability and range, while existing multi-modal fusion approaches underutilize cross-modal synergies and fail to resolve spatial disparities between modalities, resulting in misaligned BEV features with holes; (2) in the classification and localization of map elements, existing methods heavily rely on point-level modeling information while neglecting the information between elements and between point and element, leading to low accuracy with erroneous shapes and element entanglement. To address these limitations, we propose SuperMapNet, a multi-modal framework designed for long-range and high-accuracy vectorized HD map construction. This framework uses both camera images and LiDAR point clouds as input. It first tightly couples semantic information from camera images and geometric information from LiDAR point clouds by a cross-attention based synergy enhancement module and a flow-based disparity alignment module for long-range BEV feature generation. Subsequently, local information acquired by point queries and global information acquired by element queries are tightly coupled by three-level interactions for high-accuracy classification and localization, where Point2Point interaction captures local geometric consistency between points of the same element, Element2Element interaction learns global semantic relationships between elements, and Point2Element interaction complement element information for its constituent points. Experiments on the nuScenes and Argoverse2 datasets demonstrate high accuracy, surpassing previous state-of-the-art methods (SOTAs) by 14.9%/8.8% and 18.5%/3.1% mAP under the hard/easy settings, respectively, even over the double perception ranges (up to 120 in the X-axis and 60 in the Y-axis). The code is made publicly available at https://github.com/zhouruqin/SuperMapNet.
{"title":"SuperMapNet for long-range and high-accuracy vectorized HD map construction","authors":"Ruqin Zhou , Chenguang Dai , Wanshou Jiang , Yongsheng Zhang , Zhenchao Zhang , San Jiang","doi":"10.1016/j.isprsjprs.2026.01.023","DOIUrl":"10.1016/j.isprsjprs.2026.01.023","url":null,"abstract":"<div><div>Vectorized high-definition (HD) map construction is formulated as the task of classifying and localizing typical map elements based on features in a bird’s-eye view (BEV). This is essential for autonomous driving systems, providing interpretable environmental structured representations for decision and planning. Remarkable work has been achieved in recent years, but several major issues remain: (1) in the generation of the BEV features, single modality methods suffer from limited perception capability and range, while existing multi-modal fusion approaches underutilize cross-modal synergies and fail to resolve spatial disparities between modalities, resulting in misaligned BEV features with holes; (2) in the classification and localization of map elements, existing methods heavily rely on point-level modeling information while neglecting the information between elements and between point and element, leading to low accuracy with erroneous shapes and element entanglement. To address these limitations, we propose SuperMapNet, a multi-modal framework designed for long-range and high-accuracy vectorized HD map construction. This framework uses both camera images and LiDAR point clouds as input. It first tightly couples semantic information from camera images and geometric information from LiDAR point clouds by a cross-attention based synergy enhancement module and a flow-based disparity alignment module for long-range BEV feature generation. Subsequently, local information acquired by point queries and global information acquired by element queries are tightly coupled by three-level interactions for high-accuracy classification and localization, where Point2Point interaction captures local geometric consistency between points of the same element, Element2Element interaction learns global semantic relationships between elements, and Point2Element interaction complement element information for its constituent points. Experiments on the nuScenes and Argoverse2 datasets demonstrate high accuracy, surpassing previous state-of-the-art methods (SOTAs) by 14.9%/8.8% and 18.5%/3.1% mAP under the hard/easy settings, respectively, even over the double perception ranges (up to 120 <span><math><mi>m</mi></math></span> in the X-axis and 60 <span><math><mi>m</mi></math></span> in the Y-axis). The code is made publicly available at <span><span>https://github.com/zhouruqin/SuperMapNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 89-103"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-11DOI: 10.1016/j.isprsjprs.2026.01.041
Qi Li , Lan Zhang , Xi Chen , Chen Zhang , Jingyi Tian , Xianghan Sun , Liqiao Tian
Quantifying riverine total phosphorus (TP) concentration at the global scale using remote sensing remains challenging because TP is not optically active and its spatial variability is strongly regulated by hydrological and environmental processes. In this study, a dataset at the global scale comprising 25,060 in situ TP measurements from 75 major river basins was used to examine how satellite-derived reflectance, river morphology, hydrological conditions, topography, and climate jointly constrain TP variability. The results demonstrate that integrating spectral and environmental predictors substantially improves the stability and transferability of TP estimation across heterogeneous river systems. Further improvements are achieved through stacked ensemble learning (R2 = 0.80, RMSE = 0.5204, MAE = 0.3692), which effectively leverages the complementary strengths of different learning algorithms in processing both optical and environmental information. The resulting global riverine TP distribution patterns exhibit coherent latitudinal and regional gradients associated with river size, climatic regimes, and anthropogenic pressure, supporting the physical consistency of the estimates. Model explanation indicates that environmental factors such as elevation, river width, and discharge play key regulatory roles alongside spectral information. These findings demonstrate that integrating multi-source data and employing ensemble modeling approaches provides a viable pathway for large-scale estimation of non-optically active water quality parameters.
由于TP不具有光学活性,且其空间变异性受水文和环境过程的强烈调节,在全球尺度上利用遥感对河流总磷(TP)浓度进行量化仍然具有挑战性。在这项研究中,使用了一个全球尺度的数据集,包括来自75个主要河流流域的25,060个原位TP测量值,以研究卫星衍生的反射率、河流形态、水文条件、地形和气候如何共同约束TP变率。结果表明,将光谱因子与环境因子相结合,显著提高了异质性水系间全磷估算的稳定性和可移植性。通过堆叠集成学习(R2 = 0.80, RMSE = 0.5204, MAE = 0.3692)进一步改进,有效地利用了不同学习算法在处理光学和环境信息方面的互补优势。由此得出的全球河流总磷分布格局与河流大小、气候状况和人为压力相关,呈现出连贯的纬度和区域梯度,支持了估算的物理一致性。模式解释表明,高程、河流宽度和流量等环境因子与光谱信息一起起着关键的调节作用。这些发现表明,集成多源数据和采用集成建模方法为大规模估计非旋光性水质参数提供了可行的途径。
{"title":"Estimation of global riverine total phosphorus concentration based on multi-source data and stacked ensemble learning","authors":"Qi Li , Lan Zhang , Xi Chen , Chen Zhang , Jingyi Tian , Xianghan Sun , Liqiao Tian","doi":"10.1016/j.isprsjprs.2026.01.041","DOIUrl":"10.1016/j.isprsjprs.2026.01.041","url":null,"abstract":"<div><div>Quantifying riverine total phosphorus (TP) concentration at the global scale using remote sensing remains challenging because TP is not optically active and its spatial variability is strongly regulated by hydrological and environmental processes. In this study, a dataset at the global scale comprising 25,060 in situ TP measurements from 75 major river basins was used to examine how satellite-derived reflectance, river morphology, hydrological conditions, topography, and climate jointly constrain TP variability. The results demonstrate that integrating spectral and environmental predictors substantially improves the stability and transferability of TP estimation across heterogeneous river systems. Further improvements are achieved through stacked ensemble learning (R<sup>2</sup> = 0.80, RMSE = 0.5204, MAE = 0.3692), which effectively leverages the complementary strengths of different learning algorithms in processing both optical and environmental information. The resulting global riverine TP distribution patterns exhibit coherent latitudinal and regional gradients associated with river size, climatic regimes, and anthropogenic pressure, supporting the physical consistency of the estimates. Model explanation indicates that environmental factors such as elevation, river width, and discharge play key regulatory roles alongside spectral information. These findings demonstrate that integrating multi-source data and employing ensemble modeling approaches provides a viable pathway for large-scale estimation of non-optically active water quality parameters.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 588-608"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146153115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-22DOI: 10.1016/j.isprsjprs.2026.01.017
Kai Hu , Jiaxin Li , Nan Ji , Xueshang Xiang , Kai Jiang , Xieping Gao
Knowledge distillation is extensively utilized in remote sensing object detection within resource-constrained environments. Among knowledge distillation methods, prediction imitation has garnered significant attention due to its ease of deployment. However, prevailing prediction imitation paradigms, which rely on an isolated, point-wise alignment of prediction scores, neglect the crucial spatial semantic information. This oversight is particularly detrimental in remote sensing images due to the abundance of objects with weak feature responses. To this end, we propose a novel Spatial Semantic Enhanced Knowledge Distillation framework, called EKD, for remote sensing object detection. Through two complementary modules, EKD shifts the focus of prediction imitation from matching isolated values to learning structured spatial semantic information. First, for classification distillation, we introduce a Weak-feature Response Enhancement Module, which models the structured spatial relationships between objects and their background to establish an initial perception of objects with weak feature responses. Second, to further capture more refined spatial information, we propose a Teacher Boundary Refinement Module for localization distillation. It provides robust boundary guidance by constructing a regression target enriched with more comprehensive spatial information. Furthermore, we introduce a Feature Mapping mechanism to ensure this spatial semantic knowledge is effectively utilized. Through extensive experiments on the DIOR and DOTA-v1.0 datasets, our method’s superiority is consistently demonstrated across diverse architectures, including both single-stage and two-stage detectors. The results show that our EKD achieves state-of-the-art results and, in some cases, even surpasses the performance of its teacher model. The code will be available soon.
{"title":"Knowledge distillation with spatial semantic enhancement for remote sensing object detection","authors":"Kai Hu , Jiaxin Li , Nan Ji , Xueshang Xiang , Kai Jiang , Xieping Gao","doi":"10.1016/j.isprsjprs.2026.01.017","DOIUrl":"10.1016/j.isprsjprs.2026.01.017","url":null,"abstract":"<div><div>Knowledge distillation is extensively utilized in remote sensing object detection within resource-constrained environments. Among knowledge distillation methods, prediction imitation has garnered significant attention due to its ease of deployment. However, prevailing prediction imitation paradigms, which rely on an isolated, point-wise alignment of prediction scores, neglect the crucial spatial semantic information. This oversight is particularly detrimental in remote sensing images due to the abundance of objects with weak feature responses. To this end, we propose a novel Spatial Semantic Enhanced Knowledge Distillation framework, called <span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span><em>EKD</em>, for remote sensing object detection. Through two complementary modules, <span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span><em>EKD</em> shifts the focus of prediction imitation from matching isolated values to learning structured spatial semantic information. First, for classification distillation, we introduce a Weak-feature Response Enhancement Module, which models the structured spatial relationships between objects and their background to establish an initial perception of objects with weak feature responses. Second, to further capture more refined spatial information, we propose a Teacher Boundary Refinement Module for localization distillation. It provides robust boundary guidance by constructing a regression target enriched with more comprehensive spatial information. Furthermore, we introduce a Feature Mapping mechanism to ensure this spatial semantic knowledge is effectively utilized. Through extensive experiments on the DIOR and DOTA-v1.0 datasets, our method’s superiority is consistently demonstrated across diverse architectures, including both single-stage and two-stage detectors. The results show that our <span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span><em>EKD</em> achieves state-of-the-art results and, in some cases, even surpasses the performance of its teacher model. The code will be available soon.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 144-157"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-20DOI: 10.1016/j.isprsjprs.2026.01.012
Jiaxing Zhang , Chengjun Ge , Wen Xiao , Miao Tang , Jon Mills , Benjamin Coifman , Nengcheng Chen
Urban transportation systems are undergoing a paradigm shift with the integration of high-precision sensing technologies and intelligent perception frameworks. Roadside lidar, as a key enabler of infrastructure-based sensing technology, offers robust and precise 3D spatial understanding of dynamic urban scenes. This paper presents a comprehensive review of roadside lidar-based traffic perception, structured around five key modules: sensor placement strategies; multi-lidar point cloud fusion; dynamic traffic information extraction;subsequent applications including trajectory prediction, collision risk assessment, and behavioral analysis; representative roadside perception benchmark datasets. Despite notable progress, challenges remain in deployment optimization, robust registration under occlusion and dynamic conditions, generalizable object detection and tracking, and effective utilization of heterogeneous multi-modal data. Emerging trends point toward perception-driven infrastructure design, edge-cloud-terminal collaboration, and generalizable models enabled by domain adaptation, self-supervised learning, and foundation-scale datasets. This review aims to serve as a technical reference for researchers and practitioners, providing insights into current advances, open problems, and future directions in roadside lidar-based traffic perception and digital twin applications.
{"title":"Roadside lidar-based scene understanding toward intelligent traffic perception: A comprehensive review","authors":"Jiaxing Zhang , Chengjun Ge , Wen Xiao , Miao Tang , Jon Mills , Benjamin Coifman , Nengcheng Chen","doi":"10.1016/j.isprsjprs.2026.01.012","DOIUrl":"10.1016/j.isprsjprs.2026.01.012","url":null,"abstract":"<div><div>Urban transportation systems are undergoing a paradigm shift with the integration of high-precision sensing technologies and intelligent perception frameworks. Roadside lidar, as a key enabler of infrastructure-based sensing technology, offers robust and precise 3D spatial understanding of dynamic urban scenes. This paper presents a comprehensive review of roadside lidar-based traffic perception, structured around five key modules: sensor placement strategies; multi-lidar point cloud fusion; dynamic traffic information extraction;subsequent applications including trajectory prediction, collision risk assessment, and behavioral analysis; representative roadside perception benchmark datasets. Despite notable progress, challenges remain in deployment optimization, robust registration under occlusion and dynamic conditions, generalizable object detection and tracking, and effective utilization of heterogeneous multi-modal data. Emerging trends point toward perception-driven infrastructure design, edge-cloud-terminal collaboration, and generalizable models enabled by domain adaptation, self-supervised learning, and foundation-scale datasets. This review aims to serve as a technical reference for researchers and practitioners, providing insights into current advances, open problems, and future directions in roadside lidar-based traffic perception and digital twin applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 69-88"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-31DOI: 10.1016/j.isprsjprs.2026.01.007
Yitong Luo , Xiaolan Qiu , Bei Lin , Zekun Jiao , Wei Wang , Chibiao Ding
The networking capability of SAR constellations can effectively reduce the average revisit period, which has become a new trend in SAR Earth observation. However, the system electronic delay of several or even dozens of SAR satellites in a constellation must be calibrated and monitored for a long time to ensure high geometric accuracy of the product. In this paper, a geometric cross-propagation-calibration method for SAR constellations is proposed, which can calibrate the slant ranges of the SAR satellites in a constellation without any calibrators. The proposed method constructs a graph from all reference and uncalibrated SAR images involved in a cross-calibration task. For each uncalibrated image, the cumulative calibration error along paths originating from the reference images is estimated, enabling the identification of a path that minimizes this error. Cross-calibration is then performed sequentially along this optimal path. A closed-form expression is derived to estimate the cumulative calibration error along any path, which also reveals the underlying mechanism of error propagation in cross-calibration. Experiments based on real data show that the proposed method enables two China’s microsatellites, Qilu-1 and Xingrui-9, to achieve geometric accuracy of less than 5 m after calibration.
{"title":"A geometric Cross-Propagation-Calibration method for SAR constellation based on the graph theory","authors":"Yitong Luo , Xiaolan Qiu , Bei Lin , Zekun Jiao , Wei Wang , Chibiao Ding","doi":"10.1016/j.isprsjprs.2026.01.007","DOIUrl":"10.1016/j.isprsjprs.2026.01.007","url":null,"abstract":"<div><div>The networking capability of SAR constellations can effectively reduce the average revisit period, which has become a new trend in SAR Earth observation. However, the system electronic delay of several or even dozens of SAR satellites in a constellation must be calibrated and monitored for a long time to ensure high geometric accuracy of the product. In this paper, a geometric cross-propagation-calibration method for SAR constellations is proposed, which can calibrate the slant ranges of the SAR satellites in a constellation without any calibrators. The proposed method constructs a graph from all reference and uncalibrated SAR images involved in a cross-calibration task. For each uncalibrated image, the cumulative calibration error along paths originating from the reference images is estimated, enabling the identification of a path that minimizes this error. Cross-calibration is then performed sequentially along this optimal path. A closed-form expression is derived to estimate the cumulative calibration error along any path, which also reveals the underlying mechanism of error propagation in cross-calibration. Experiments based on real data show that the proposed method enables two China’s microsatellites, Qilu-1 and Xingrui-9, to achieve geometric accuracy of less than 5 m after calibration.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 346-359"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-05DOI: 10.1016/j.isprsjprs.2026.01.036
Tao Wang , Chenyu Lin , Chenwei Tang , Jizhe Zhou , Deng Xiong , Jianan Li , Jian Zhao , Jiancheng Lv
Detecting objects from UAV-captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom-in framework is explored for object detection on UAV images. The main motivation is that the foreground objects are generally smaller and sparser than those in common scene images, which hinders the optimization of effective object detectors. We thus aim to zoom in adaptively on the objects to better capture object features for the detection task. To achieve the goal, two core designs are required: (i) How to conduct non-uniform zooming on each image efficiently? (ii) How to enable object detection training and inference with the zoomed image space? Correspondingly, a lightweight offset prediction scheme coupled with a novel box-based zooming objective is introduced to learn non-uniform zooming on the input image. Based on the learned zooming transformation, a corner-aligned bounding box transformation method is proposed. The method warps the ground-truth bounding boxes to the zoomed space to learn object detection, and warps the predicted bounding boxes back to the original space during inference. We conduct extensive experiments on three representative UAV object detection datasets, including VisDrone, UAVDT, and SeaDronesSee. The proposed ZoomDet is architecture-independent and can be applied to an arbitrary object detection architecture. Remarkably, on the SeaDronesSee dataset, ZoomDet offers more than 8.4 absolute gain of mAP with a Faster R-CNN model, with only about 3 ms additional latency. The code is available at https://github.com/twangnh/zoomdet_code.
{"title":"Adaptive image zoom-in with bounding box transformation for UAV object detection","authors":"Tao Wang , Chenyu Lin , Chenwei Tang , Jizhe Zhou , Deng Xiong , Jianan Li , Jian Zhao , Jiancheng Lv","doi":"10.1016/j.isprsjprs.2026.01.036","DOIUrl":"10.1016/j.isprsjprs.2026.01.036","url":null,"abstract":"<div><div>Detecting objects from UAV-captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom-in framework is explored for object detection on UAV images. The main motivation is that the foreground objects are generally smaller and sparser than those in common scene images, which hinders the optimization of effective object detectors. We thus aim to zoom in adaptively on the objects to better capture object features for the detection task. To achieve the goal, two core designs are required: (i) How to conduct non-uniform zooming on each image efficiently? (ii) How to enable object detection training and inference with the zoomed image space? Correspondingly, a lightweight offset prediction scheme coupled with a novel box-based zooming objective is introduced to learn non-uniform zooming on the input image. Based on the learned zooming transformation, a corner-aligned bounding box transformation method is proposed. The method warps the ground-truth bounding boxes to the zoomed space to learn object detection, and warps the predicted bounding boxes back to the original space during inference. We conduct extensive experiments on three representative UAV object detection datasets, including VisDrone, UAVDT, and SeaDronesSee. The proposed ZoomDet is architecture-independent and can be applied to an arbitrary object detection architecture. Remarkably, on the SeaDronesSee dataset, ZoomDet offers more than 8.4 absolute gain of mAP with a Faster R-CNN model, with only about 3 ms additional latency. The code is available at <span><span>https://github.com/twangnh/zoomdet_code</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 452-466"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-23DOI: 10.1016/j.isprsjprs.2026.01.030
Wenpeng Zhao , Shanchuan Guo , Xueliang Zhang , Pengfei Tang , Xiaoquan Pan , Haowei Mu , Chenghan Yang , Zilong Xia , Zheng Wang , Jun Du , Peijun Du
Large-scale and fine-grained extraction of agricultural parcels from very-high-resolution (VHR) imagery is essential for precision agriculture. However, traditional parcel segmentation methods and fully supervised deep learning approaches typically face scalability constraints due to costly manual annotations, while extraction accuracy is generally limited by the inadequate capacity of segmentation architectures to represent complex agricultural scenes. To address these challenges, this study proposes a Weakly Supervised approach for agricultural Parcel Extraction (WSPE), which leverages publicly available 10 m resolution images and labels to guide the delineation of 0.5 m agricultural parcels. The WSPE framework integrates the tabular (Tabular Prior-data Fitted Network, TabPFN) and the vision foundation model (Segment Anything Model 2, SAM2) to initially generate pseudo-labels with high geometric precision. These pseudo-labels are further refined for semantic accuracy through an adaptive noisy label correction module based on curriculum learning. The refined knowledge is distilled into the proposed Triple-branch Kolmogorov-Arnold enhanced Boundary-aware Network (TKBNet), a prompt-free end-to-end architecture enabling rapid inference and scalable deployment, with outputs vectorized through post-processing. The effectiveness of WSPE was evaluated on a self-constructed dataset from nine agricultural zones in China, the public AI4Boundaries and FGFD datasets, and three large-scale regions: Zhoukou, Hengshui, and Fengcheng. Results demonstrate that WSPE and its integrated TKBNet achieve robust performance across datasets with diverse agricultural scenes, validated by extensive comparative and ablation experiments. The weakly supervised approach achieves 97.7 % of fully supervised performance, and large-scale deployment verifies its scalability and generalization, offering a practical solution for fine-grained, large-scale agricultural parcel mapping. Code is available at https://github.com/zhaowenpeng/WSPE.
从高分辨率(VHR)图像中大规模和细粒度地提取农业地块对于精准农业至关重要。然而,传统的包裹分割方法和完全监督的深度学习方法通常面临可扩展性的限制,因为人工标注成本高,而提取精度通常受到分割架构表示复杂农业场景的能力不足的限制。为了解决这些挑战,本研究提出了一种弱监督的农业包裹提取方法(WSPE),该方法利用公开可用的10米分辨率图像和标签来指导0.5米农业包裹的描绘。WSPE框架集成了表格(tabular Prior-data拟合网络,TabPFN)和视觉基础模型(Segment Anything model 2, SAM2),初步生成几何精度较高的伪标签。这些伪标签通过基于课程学习的自适应噪声标签校正模块进一步细化语义准确性。精细化的知识被提炼到提议的三分支Kolmogorov-Arnold增强边界感知网络(TKBNet)中,这是一种即时的端到端架构,可以实现快速推理和可扩展部署,并通过后处理将输出矢量化。利用中国9个农业区自建数据集、AI4Boundaries和FGFD公共数据集以及周口、衡水和丰城3个大尺度区域对WSPE的有效性进行了评价。结果表明,WSPE及其集成的TKBNet在不同农业场景的数据集上实现了稳健的性能,并得到了广泛的对比和消融实验的验证。弱监督方法达到了97.7%的完全监督性能,大规模部署验证了其可扩展性和泛化性,为细粒度、大规模的农业地块测绘提供了实用的解决方案。代码可从https://github.com/zhaowenpeng/WSPE获得。
{"title":"A weakly supervised approach for large-scale agricultural parcel extraction from VHR imagery via foundation models and adaptive noise correction","authors":"Wenpeng Zhao , Shanchuan Guo , Xueliang Zhang , Pengfei Tang , Xiaoquan Pan , Haowei Mu , Chenghan Yang , Zilong Xia , Zheng Wang , Jun Du , Peijun Du","doi":"10.1016/j.isprsjprs.2026.01.030","DOIUrl":"10.1016/j.isprsjprs.2026.01.030","url":null,"abstract":"<div><div>Large-scale and fine-grained extraction of agricultural parcels from very-high-resolution (VHR) imagery is essential for precision agriculture. However, traditional parcel segmentation methods and fully supervised deep learning approaches typically face scalability constraints due to costly manual annotations, while extraction accuracy is generally limited by the inadequate capacity of segmentation architectures to represent complex agricultural scenes. To address these challenges, this study proposes a Weakly Supervised approach for agricultural Parcel Extraction (WSPE), which leverages publicly available 10 m resolution images and labels to guide the delineation of 0.5 m agricultural parcels. The WSPE framework integrates the tabular (Tabular Prior-data Fitted Network, TabPFN) and the vision foundation model (Segment Anything Model 2, SAM2) to initially generate pseudo-labels with high geometric precision. These pseudo-labels are further refined for semantic accuracy through an adaptive noisy label correction module based on curriculum learning. The refined knowledge is distilled into the proposed Triple-branch Kolmogorov-Arnold enhanced Boundary-aware Network (TKBNet), a prompt-free end-to-end architecture enabling rapid inference and scalable deployment, with outputs vectorized through post-processing. The effectiveness of WSPE was evaluated on a self-constructed dataset from nine agricultural zones in China, the public AI4Boundaries and FGFD datasets, and three large-scale regions: Zhoukou, Hengshui, and Fengcheng. Results demonstrate that WSPE and its integrated TKBNet achieve robust performance across datasets with diverse agricultural scenes, validated by extensive comparative and ablation experiments. The weakly supervised approach achieves 97.7 % of fully supervised performance, and large-scale deployment verifies its scalability and generalization, offering a practical solution for fine-grained, large-scale agricultural parcel mapping. Code is available at <span><span>https://github.com/zhaowenpeng/WSPE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 180-208"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-27DOI: 10.1016/j.isprsjprs.2026.01.031
Josef Taher , Eric Hyyppä , Matti Hyyppä , Klaara Salolahti , Xiaowei Yu , Leena Matikainen , Antero Kukko , Matti Lehtomäki , Harri Kaartinen , Sopitta Thurachen , Paula Litkey , Ville Luoma , Markus Holopainen , Gefei Kong , Hongchao Fan , Petri Rönnholm , Matti Vaaja , Antti Polvivaara , Samuli Junttila , Mikko Vastaranta , Juha Hyyppä
<div><div>Climate-smart and biodiversity-preserving forestry demands precise information on forest resources, extending to the individual tree level. Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing, but challenges remain in leveraging deep learning techniques and identifying rare tree species in class-imbalanced datasets. This study addresses these gaps by conducting a comprehensive benchmark of deep learning and traditional shallow machine learning methods for tree species classification. For the study, we collected high-density multispectral ALS data (<span><math><mrow><mo>></mo><mn>1000</mn></mrow></math></span> <span><math><mrow><mi>pts</mi><mo>/</mo><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>) at three wavelengths using the FGI-developed HeliALS system, complemented by existing Optech Titan data (35 <span><math><mrow><mi>pts</mi><mo>/</mo><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>), to evaluate the species classification accuracy of various algorithms in a peri-urban study area located in southern Finland. We established a field reference dataset of 6326 segments across nine species using a newly developed browser-based crowdsourcing tool, which facilitated efficient data annotation. The ALS data, including a training dataset of 1065 segments, was shared with the scientific community to foster collaborative research and diverse algorithmic contributions. Based on 5261 test segments, our findings demonstrate that point-based deep learning methods, particularly a point transformer model, outperformed traditional machine learning and image-based deep learning approaches on high-density multispectral point clouds. For the high-density ALS dataset, a point transformer model provided the best performance reaching an overall (macro-average) accuracy of 87.9% (74.5%) with a training set of 1065 segments and 92.0% (85.1%) with a larger training set of 5000 segments. With 1065 training segments, the best image-based deep learning method, DetailView, reached an overall (macro-average) accuracy of 84.3% (63.9%), whereas a shallow random forest (RF) classifier achieved an overall (macro-average) accuracy of 83.2% (61.3%). For the sparser ALS dataset, an RF model topped the list with an overall (macro-average) accuracy of 79.9% (57.6%), closely followed by the point transformer at 79.6% (56.0%). Importantly, the overall classification accuracy of the point transformer model on the HeliALS data increased from 73.0% with no spectral information to 84.7% with single-channel reflectance, and to 87.9% with spectral information of all the three channels. Furthermore, we studied the scaling of the classification accuracy as a function of point density and training set size using 5-fold cross-validation of our dataset. Based on our findings, multispectral information is especially beneficial for sparse point clouds with 1–50 <span><math>
{"title":"Multispectral airborne laser scanning for tree species classification: A benchmark of machine learning and deep learning algorithms","authors":"Josef Taher , Eric Hyyppä , Matti Hyyppä , Klaara Salolahti , Xiaowei Yu , Leena Matikainen , Antero Kukko , Matti Lehtomäki , Harri Kaartinen , Sopitta Thurachen , Paula Litkey , Ville Luoma , Markus Holopainen , Gefei Kong , Hongchao Fan , Petri Rönnholm , Matti Vaaja , Antti Polvivaara , Samuli Junttila , Mikko Vastaranta , Juha Hyyppä","doi":"10.1016/j.isprsjprs.2026.01.031","DOIUrl":"10.1016/j.isprsjprs.2026.01.031","url":null,"abstract":"<div><div>Climate-smart and biodiversity-preserving forestry demands precise information on forest resources, extending to the individual tree level. Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing, but challenges remain in leveraging deep learning techniques and identifying rare tree species in class-imbalanced datasets. This study addresses these gaps by conducting a comprehensive benchmark of deep learning and traditional shallow machine learning methods for tree species classification. For the study, we collected high-density multispectral ALS data (<span><math><mrow><mo>></mo><mn>1000</mn></mrow></math></span> <span><math><mrow><mi>pts</mi><mo>/</mo><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>) at three wavelengths using the FGI-developed HeliALS system, complemented by existing Optech Titan data (35 <span><math><mrow><mi>pts</mi><mo>/</mo><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>), to evaluate the species classification accuracy of various algorithms in a peri-urban study area located in southern Finland. We established a field reference dataset of 6326 segments across nine species using a newly developed browser-based crowdsourcing tool, which facilitated efficient data annotation. The ALS data, including a training dataset of 1065 segments, was shared with the scientific community to foster collaborative research and diverse algorithmic contributions. Based on 5261 test segments, our findings demonstrate that point-based deep learning methods, particularly a point transformer model, outperformed traditional machine learning and image-based deep learning approaches on high-density multispectral point clouds. For the high-density ALS dataset, a point transformer model provided the best performance reaching an overall (macro-average) accuracy of 87.9% (74.5%) with a training set of 1065 segments and 92.0% (85.1%) with a larger training set of 5000 segments. With 1065 training segments, the best image-based deep learning method, DetailView, reached an overall (macro-average) accuracy of 84.3% (63.9%), whereas a shallow random forest (RF) classifier achieved an overall (macro-average) accuracy of 83.2% (61.3%). For the sparser ALS dataset, an RF model topped the list with an overall (macro-average) accuracy of 79.9% (57.6%), closely followed by the point transformer at 79.6% (56.0%). Importantly, the overall classification accuracy of the point transformer model on the HeliALS data increased from 73.0% with no spectral information to 84.7% with single-channel reflectance, and to 87.9% with spectral information of all the three channels. Furthermore, we studied the scaling of the classification accuracy as a function of point density and training set size using 5-fold cross-validation of our dataset. Based on our findings, multispectral information is especially beneficial for sparse point clouds with 1–50 <span><math>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 278-309"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fine-grained land-cover mapping is crucial for accurately assessing environmental degradation and monitoring socioeconomic dynamics. Few-shot learning of hyperspectral images offers a promising solution in cases where sample collection is limited. However, previous studies, such as tree species mapping, typically use 1% or 0.5% of samples per class, yielding thousands of samples for common species but struggling to identify unseen or rare species (only one sample/shot) in real-world scenarios. Furthermore, inevitable cross-sensor, cross-category, and cross-scene variations significantly increase the occurrence of unseen or rare classes and spectral heterogeneity within common land-cover types. To this being, we propose Knowing-Net, a knowledge-data-model-driven multimodal few-shot learning network, to bridge the application gap for fine-grained mapping of unseen or rare classes. In Knowing-Net, prior knowledge of sensor, i.e., spectral parameters, is leveraged to reconstruct cross-sensor hyperspectral images, mitigating heterogeneity in spectral responses across datasets and enabling cross-domain transfer across different sensors, scenes, and land cover types. To breakthrough the gap in recognizing unseen classes, multimodal data, including textual descriptions and natural images of unseen classes, is embedded into network to construct shared side information through modality-specific feature learning. By designing a cross-alignment mechanism for hyperspectral and multimodal information in a shared semantic space, distinct encoders are guided to produce consistent distribution for the same class across different modalities, reducing sample dependency and facilitating the identification of unseen or rare classes. Finally, inspired by the first law of geography, a sliding discriminant window is designed to incorporate spatial context, enhancing geography interpretability and robustness to noise. We evaluate Knowing-Net on five challenging airborne hyperspectral datasets with a fine-grained classification system, covering crop type, tree species, and similar urban land covers with varying materials. Extensive experiments on five datasets consistently demonstrate Knowing-net’s superiority over state-of-the-art methods in both mapping performance and cross-domain generalization. Notably, the unified framework achieves state-of-the-art results in one-shot learning and establishes a new paradigm in zero-shot classification for fine-grained land cover tasks. To the best of our knowledge, this is the first comprehensive generalization of FSL across sensor, category, and scene for hyperspectral image-based fine mapping.
{"title":"Knowledge-data-model-driven multimodal few-shot learning for hyperspectral fine classification: Generalization across sensor, category and scene","authors":"Qiqi Zhu, Mingzhen Xu, Rui Ma, Longli Ran, Jiayao Xue, Qingfeng Guan","doi":"10.1016/j.isprsjprs.2026.02.001","DOIUrl":"10.1016/j.isprsjprs.2026.02.001","url":null,"abstract":"<div><div>Fine-grained land-cover mapping is crucial for accurately assessing environmental degradation and monitoring socioeconomic dynamics. Few-shot learning of hyperspectral images offers a promising solution in cases where sample collection is limited. However, previous studies, such as tree species mapping, typically use 1% or 0.5% of samples per class, yielding thousands of samples for common species but struggling to identify unseen or rare species (only one sample/shot) in real-world scenarios. Furthermore, inevitable cross-sensor, cross-category, and cross-scene variations significantly increase the occurrence of unseen or rare classes and spectral heterogeneity within common land-cover types. To this being, we propose Knowing-Net, a knowledge-data-model-driven multimodal few-shot learning network, to bridge the application gap for fine-grained mapping of unseen or rare classes. In Knowing-Net, prior knowledge of sensor, i.e., spectral parameters, is leveraged to reconstruct cross-sensor hyperspectral images, mitigating heterogeneity in spectral responses across datasets and enabling cross-domain transfer across different sensors, scenes, and land cover types. To breakthrough the gap in recognizing unseen classes, multimodal data, including textual descriptions and natural images of unseen classes, is embedded into network to construct shared side information through modality-specific feature learning. By designing a cross-alignment mechanism for hyperspectral and multimodal information in a shared semantic space, distinct encoders are guided to produce consistent distribution for the same class across different modalities, reducing sample dependency and facilitating the identification of unseen or rare classes. Finally, inspired by the first law of geography, a sliding discriminant window is designed to incorporate spatial context, enhancing geography interpretability and robustness to noise. We evaluate Knowing-Net on five challenging airborne hyperspectral datasets with a fine-grained classification system, covering crop type, tree species, and similar urban land covers with varying materials. Extensive experiments on five datasets consistently demonstrate Knowing-net’s superiority over state-of-the-art methods in both mapping performance and cross-domain generalization. Notably, the unified framework achieves state-of-the-art results in one-shot learning and establishes a new paradigm in zero-shot classification for fine-grained land cover tasks. To the best of our knowledge, this is the first comprehensive generalization of FSL across sensor, category, and scene for hyperspectral image-based fine mapping.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 623-650"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146160594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}