首页 > 最新文献

ISPRS Journal of Photogrammetry and Remote Sensing最新文献

英文 中文
Multimodal remote sensing change detection: An image matching perspective 多模态遥感变化检测:图像匹配视角
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-02-06 DOI: 10.1016/j.isprsjprs.2026.02.004
Hongruixuan Chen , Cuiling Lan , Jian Song , Damian Ibañez , Junshi Xia , Konrad Schindler , Naoto Yokoya
Change Detection (CD) between images with different modalities is a fundamental capability for remote sensing. In this work, we pinpoint the commonalities between Multimodal Change Detection (MCD) and Multimodal Image Matching (MIM). Accordingly, we present a new unsupervised CD framework designed from the perspective of Image Matching (IM), called IM4CD. It unifies the IM and CD tasks into a single, coherent framework. In this framework, we abandon the prevalent strategy in MCD to compare per-pixel image features, since it is in practice quite difficult to design features that are truly invariant across modalities. Instead, we propose to compute similarity by local template matching and utilize the spatial offset of response peaks to represent change intensity between images with different modalities, and then to integrate it tightly with the co-registration of the two images, which anyway includes such a matching step. In this way, the same off-the-shelf descriptors used for MIM also support MCD. In other words, we first extract modality-independent features, then detect salient points to obtain initial pairs of corresponding Control Points (CP). When matching those points to accurately register the images, CP pairs located in unchanged areas show low residuals, whereas those in changed areas show high residuals. The CPs can then be connected into a Conditional Random Field (CRF), leveraging modality-independent structural relationships to estimate dense change maps. Experimental results show the effectiveness of our method, including robustness to registration errors, its compatibility with different image descriptors, and promising potential for challenging real-world disaster response scenarios.
不同模态图像之间的变化检测是遥感的一项基本能力。在这项工作中,我们指出了多模态变化检测(MCD)和多模态图像匹配(MIM)之间的共性。因此,我们从图像匹配(IM)的角度设计了一个新的无监督CD框架,称为IM4CD。它将IM和CD任务统一到一个统一的框架中。在这个框架中,我们放弃了MCD中比较逐像素图像特征的流行策略,因为在实践中很难设计出跨模态真正不变的特征。相反,我们建议通过局部模板匹配来计算相似度,并利用响应峰的空间偏移量来表示不同模态图像之间的变化强度,然后将其与两幅图像的共配准紧密结合,其中无论如何都包含这样一个匹配步骤。通过这种方式,用于MIM的现成描述符也支持MCD。换句话说,我们首先提取模态无关的特征,然后检测显著点以获得相应控制点(CP)的初始对。在对这些点进行精确配准时,位于不变区域的CP对残差较低,而位于变化区域的CP对残差较高。然后可以将CPs连接到条件随机场(CRF)中,利用与模态无关的结构关系来估计密集的变化图。实验结果表明了我们的方法的有效性,包括对配准错误的鲁棒性,它与不同图像描述符的兼容性,以及具有挑战性的现实世界灾难响应场景的潜力。
{"title":"Multimodal remote sensing change detection: An image matching perspective","authors":"Hongruixuan Chen ,&nbsp;Cuiling Lan ,&nbsp;Jian Song ,&nbsp;Damian Ibañez ,&nbsp;Junshi Xia ,&nbsp;Konrad Schindler ,&nbsp;Naoto Yokoya","doi":"10.1016/j.isprsjprs.2026.02.004","DOIUrl":"10.1016/j.isprsjprs.2026.02.004","url":null,"abstract":"<div><div>Change Detection (CD) between images with different modalities is a fundamental capability for remote sensing. In this work, we pinpoint the commonalities between Multimodal Change Detection (MCD) and Multimodal Image Matching (MIM). Accordingly, we present a new unsupervised CD framework designed from the perspective of Image Matching (IM), called IM4CD. It unifies the IM and CD tasks into a single, coherent framework. In this framework, we abandon the prevalent strategy in MCD to compare per-pixel image features, since it is in practice quite difficult to design features that are truly invariant across modalities. Instead, we propose to compute similarity by local template matching and utilize the spatial offset of response peaks to represent change intensity between images with different modalities, and then to integrate it tightly with the co-registration of the two images, which anyway includes such a matching step. In this way, the same off-the-shelf descriptors used for MIM also support MCD. In other words, we first extract modality-independent features, then detect salient points to obtain initial pairs of corresponding Control Points (CP). When matching those points to accurately register the images, CP pairs located in unchanged areas show low residuals, whereas those in changed areas show high residuals. The CPs can then be connected into a Conditional Random Field (CRF), leveraging modality-independent structural relationships to estimate dense change maps. Experimental results show the effectiveness of our method, including robustness to registration errors, its compatibility with different image descriptors, and promising potential for challenging real-world disaster response scenarios.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 487-501"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMP-Mapper: Multi-modal priors enhancing vectorized HD road map construction from aerial imagery MMP-Mapper:多模态先验从航空图像增强矢量化高清地图建设
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-02-07 DOI: 10.1016/j.isprsjprs.2026.02.008
Haofeng Xie , Huiwei Jiang , Yandi Yang , Xiangyun Hu
High-definition (HD) road maps are indispensable for autonomous driving, supporting tasks such as localization, planning, and navigation. The traditional construction of HD road maps heavily relies on manual annotation of data from LiDAR, cameras, and GPS/IMU, a process that is both costly and time-consuming. While recent work has explored automatic HD road map extraction from aerial imagery — a data source offering broad-area coverage and superior robustness — existing methods face a critical limitation. They often process only a single, isolated image tile, failing to leverage crucial spatial context and semantic priors from multi-modal data sources. This shortage severely impacts map accuracy and continuity, especially at complex intersections and in occluded areas. To overcome these challenges, we propose MMP-Mapper, a novel framework that enhances HD road map construction with multi-modal priors. MMP-Mapper introduces two key modules: (1) the Contextual Image Fusion (CIF) module, which selects and fuses features from neighbor image tiles to provide spatial continuity; and (2) the Map-Guided Fusion (MGF) module, which uses a Transformer module to fuse the encoded semantic attributes from standard-definition (SD) road maps with geometric priors, guiding HD road map construction. We validate our framework on the Aerial Argoverse 2 and OpenSatMap datasets. Our results demonstrate that MMP-Mapper outperforms state-of-the-art baselines in both accuracy and generalization for aerial-imagery-based HD road map construction.
高清(HD)地图对于自动驾驶来说是必不可少的,它支持定位、规划和导航等任务。传统的高清地图构建严重依赖于手动标注来自激光雷达、摄像头和GPS/IMU的数据,这一过程既昂贵又耗时。虽然最近的工作已经探索了从航空图像中自动提取高清地图-一种提供广域覆盖和优越鲁棒性的数据源-现有方法面临着严重的限制。它们通常只处理单个、孤立的图像块,无法利用来自多模态数据源的关键空间上下文和语义先验。这种不足严重影响了地图的准确性和连续性,特别是在复杂的十字路口和闭塞地区。为了克服这些挑战,我们提出了MMP-Mapper,这是一个新的框架,可以增强具有多模态先验的高清地图构建。MMP-Mapper引入了两个关键模块:(1)上下文图像融合(CIF)模块,该模块从相邻图像块中选择和融合特征以提供空间连续性;(2)地图引导融合(map - guided Fusion, MGF)模块,该模块使用Transformer模块将标清(SD)地图的编码语义属性与几何先验融合,指导高清地图的构建。我们在Aerial Argoverse 2和OpenSatMap数据集上验证我们的框架。我们的研究结果表明,MMP-Mapper在基于航空图像的高清地图构建的精度和泛化方面都优于最先进的基线。
{"title":"MMP-Mapper: Multi-modal priors enhancing vectorized HD road map construction from aerial imagery","authors":"Haofeng Xie ,&nbsp;Huiwei Jiang ,&nbsp;Yandi Yang ,&nbsp;Xiangyun Hu","doi":"10.1016/j.isprsjprs.2026.02.008","DOIUrl":"10.1016/j.isprsjprs.2026.02.008","url":null,"abstract":"<div><div>High-definition (HD) road maps are indispensable for autonomous driving, supporting tasks such as localization, planning, and navigation. The traditional construction of HD road maps heavily relies on manual annotation of data from LiDAR, cameras, and GPS/IMU, a process that is both costly and time-consuming. While recent work has explored automatic HD road map extraction from aerial imagery — a data source offering broad-area coverage and superior robustness — existing methods face a critical limitation. They often process only a single, isolated image tile, failing to leverage crucial spatial context and semantic priors from multi-modal data sources. This shortage severely impacts map accuracy and continuity, especially at complex intersections and in occluded areas. To overcome these challenges, we propose MMP-Mapper, a novel framework that enhances HD road map construction with multi-modal priors. MMP-Mapper introduces two key modules: (1) the Contextual Image Fusion (CIF) module, which selects and fuses features from neighbor image tiles to provide spatial continuity; and (2) the Map-Guided Fusion (MGF) module, which uses a Transformer module to fuse the encoded semantic attributes from standard-definition (SD) road maps with geometric priors, guiding HD road map construction. We validate our framework on the Aerial Argoverse 2 and OpenSatMap datasets. Our results demonstrate that MMP-Mapper outperforms state-of-the-art baselines in both accuracy and generalization for aerial-imagery-based HD road map construction.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 543-555"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Satellite-based heat Index estimatioN modEl (SHINE): An integrated machine learning approach for the conterminous United States 基于卫星的热指数估算模型(SHINE):美国周边地区的综合机器学习方法
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-01-23 DOI: 10.1016/j.isprsjprs.2026.01.018
Seyed Babak Haji Seyed Asadollah, Giorgos Mountrakis, Stephen B. Shaw
The accelerating frequency, duration and intensity of extreme heat events demand accurate, spatially complete heat exposure metrics. Here, a modeling approach is presented for estimating the daily-maximum Heat Index (HI) at 1 km spatial resolution. Our study area covered the conterminous United States (CONUS) during the warm season (May to September) between 2003 and 2023. More than 4.6 million observations from approximately 2000 weather stations were paired with weather-related, geographical, land cover and historical climatic factors to develop the proposed Satellite-based Heat Index estimatioN modEl (SHINE). Selected explanatory variables at daily temporal intervals included reanalysis products from Modern-Era Retrospective analysis for Research and Applications (MERRA) and direct satellite products from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor.
The most influential variables for HI estimation were the MERRA surface layer height and specific humidity products and the dual-pass MODIS daily land surface temperatures. These were followed by land cover products capturing water and forest presence, historical norms of wind speed and maximum temperature, elevation information and the corresponding day of year. An Extreme Gradient Boosting (XGBoost) regressor trained with spatial cross-validation explained 93 % of the variance (R2 = 0.93) and attained a Root Mean Square Error (RMSE) of 1.9°C and a Mean Absolute Error (MAE) of 1.4°C. Comparison of alternative configurations showed that while a MERRA-only model provided slightly higher accuracy (RMSE of 1.8°C), its coarse resolution failed to capture fine-scale heat variations. Conversely, a MODIS-only model offered kilometer-scale spatial resolution but with higher estimation errors (RMSE of 2.9°C). Integrating both MERRA and MODIS sources enabled SHINE to maintain spatial detail and preserved accuracy, underscoring the complementary strengths of reanalysis and satellite products. SHINE also demonstrated resistance to missing MODIS LST observations due to clouds as the additional RMSE error was approximately 0.5°C in the worst case of missing both morning and afternoon MODIS land surface temperature observations. Spatial error analysis revealed <1.7°C RMSE in arid and Mediterranean zones but larger, more heterogeneous errors in the humid Midwest and High Plains. From the policy perspective and considering the HI operational range for public-health heat effects, the proposed SHINE approach outperformed typically used proxies, such as land surface and air temperature. The resulting 1 km daily HI estimations can potentially be used as the foundation of the first wall-to-wall, multi-decadal, high resolution heat dataset for CONUS and offer actionable information for public-health heat studies, energy-demand forecasting and environmental-justice implications.
极端高温事件的频率、持续时间和强度不断加快,需要精确、空间完整的热暴露指标。本文提出了在1 km空间分辨率下估算日最大热指数(HI)的建模方法。我们的研究区域覆盖了2003年至2023年暖季(5月至9月)的美国(CONUS)。来自大约2000个气象站的460多万份观测资料与天气、地理、土地覆盖和历史气候因素相结合,开发了拟议的基于卫星的热指数估算模型(SHINE)。选取的每日时间间隔解释变量包括来自现代研究与应用回顾性分析(MERRA)的再分析产品和来自中分辨率成像光谱仪(MODIS)传感器的直接卫星产品。对HI估算影响最大的变量是MERRA地表高度和比湿产品以及MODIS双通道地表日温度。其次是土地覆盖产品,包括水和森林的存在、风速和最高温度的历史标准、海拔信息和相应的年份。使用空间交叉验证训练的极端梯度增强(XGBoost)回归器解释了93% %的方差(R2 = 0.93),获得了1.9°C的均方根误差(RMSE)和1.4°C的平均绝对误差(MAE)。不同配置的对比表明,仅merra模式的精度略高(RMSE为1.8°C),但其粗分辨率无法捕获精细尺度的热量变化。相反,仅使用modis的模式提供千米尺度的空间分辨率,但估计误差较高(RMSE为2.9°C)。整合MERRA和MODIS源使SHINE能够保持空间细节和保持精度,强调再分析和卫星产品的互补优势。SHINE还显示出对由于云层而丢失的MODIS LST观测值的抵抗能力,因为在最坏的情况下,在丢失上午和下午MODIS陆地表面温度观测值的情况下,额外的RMSE误差约为0.5°C。空间误差分析显示,干旱和地中海地区的RMSE为 <;1.7°C,而湿润的中西部和高平原地区的误差更大,异质性更强。从政策角度来看,并考虑到公共卫生热效应的HI操作范围,拟议的SHINE方法优于通常使用的替代指标,如陆地表面和空气温度。由此产生的每天1 公里的HI估计可能被用作CONUS的第一个墙到墙、多年代际、高分辨率热量数据集的基础,并为公共卫生热量研究、能源需求预测和环境正义影响提供可操作的信息。
{"title":"Satellite-based heat Index estimatioN modEl (SHINE): An integrated machine learning approach for the conterminous United States","authors":"Seyed Babak Haji Seyed Asadollah,&nbsp;Giorgos Mountrakis,&nbsp;Stephen B. Shaw","doi":"10.1016/j.isprsjprs.2026.01.018","DOIUrl":"10.1016/j.isprsjprs.2026.01.018","url":null,"abstract":"<div><div>The accelerating frequency, duration and intensity of extreme heat events demand accurate, spatially complete heat exposure metrics. Here, a modeling approach is presented for estimating the daily-maximum Heat Index (HI) at 1 km spatial resolution. Our study area covered the conterminous United States (CONUS) during the warm season (May to September) between 2003 and 2023. More than 4.6 million observations from approximately 2000 weather stations were paired with weather-related, geographical, land cover and historical climatic factors to develop the proposed Satellite-based Heat Index estimatioN modEl (SHINE). Selected explanatory variables at daily temporal intervals included reanalysis products from Modern-Era Retrospective analysis for Research and Applications (MERRA) and direct satellite products from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor.</div><div>The most influential variables for HI estimation were the MERRA surface layer height and specific humidity products and the dual-pass MODIS daily land surface temperatures. These were followed by land cover products capturing water and forest presence, historical norms of wind speed and maximum temperature, elevation information and the corresponding day of year. An Extreme Gradient Boosting (XGBoost) regressor trained with spatial cross-validation explained 93 % of the variance (R<sup>2</sup> = 0.93) and attained a Root Mean Square Error (RMSE) of 1.9°C and a Mean Absolute Error (MAE) of 1.4°C. Comparison of alternative configurations showed that while a MERRA-only model provided slightly higher accuracy (RMSE of 1.8°C), its coarse resolution failed to capture fine-scale heat variations. Conversely, a MODIS-only model offered kilometer-scale spatial resolution but with higher estimation errors (RMSE of 2.9°C). Integrating both MERRA and MODIS sources enabled SHINE to maintain spatial detail and preserved accuracy, underscoring the complementary strengths of reanalysis and satellite products. SHINE also demonstrated resistance to missing MODIS LST observations due to clouds as the additional RMSE error was approximately 0.5°C in the worst case of missing both morning and afternoon MODIS land surface temperature observations. Spatial error analysis revealed &lt;1.7°C RMSE in arid and Mediterranean zones but larger, more heterogeneous errors in the humid Midwest and High Plains. From the policy perspective and considering the HI operational range for public-health heat effects, the proposed SHINE approach outperformed typically used proxies, such as land surface and air temperature. The resulting 1 km daily HI estimations can potentially be used as the foundation of the first wall-to-wall, multi-decadal, high resolution heat dataset for CONUS and offer actionable information for public-health heat studies, energy-demand forecasting and environmental-justice implications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 209-230"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing conditional generative adversarial networks for SAR-to-optical image translation via auxiliary geospatial landscape pattern-augmentation 利用条件生成对抗网络,通过辅助地理空间景观模式增强进行sar到光学图像的转换
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-02-06 DOI: 10.1016/j.isprsjprs.2026.01.043
Hongbo Liang , Xuezhi Yang , Xiangyu Yang , Xin Jing
Synthetic aperture radar (SAR) enables all-weather, all-day Earth observation, yet speckle noise and geometric distortions from complex electromagnetic scattering imaging severely limit its visual interpretability. SAR-to-optical image translation (S2OIT) has emerged to mitigate these challenges, but remains hindered by the data heterogeneity and spectral discrepancies between SAR and optical domains, where integrating auxiliary knowledge offers a viable remedy. What is more, previous studies rely on a pixel-wise constrained adversarial learning paradigm with limited mining of geospatial landscape information are prone to generating low-fidelity images. To tackle these issues, we propose AGPA-CGAN, a conditional generative adversarial network (CGAN) framework with auxiliary geospatial landscape pattern-augmentation for high-quality S2OIT. AGPA-CGAN progressively narrows the gap between translated and reference images by integrating ample SAR prior properties and geospatial structural information from scenario image pairs into the S2OIT process. Specifically, to fully exploit the tremendous priors of SAR images, we design an auxiliary pseudo-scattering pattern integration (APSPI) module to extract hierarchical subspace frequency conditional representations, thereby aiding AGPA-CGAN in capturing more descriptive cues for S2OIT. In particular, we introduce an unsupervised subspace embedding clustering (SEC) algorithm based on subspace frequency analysis (SSFA) within APSPI to derive statistical pseudo-scattering behavior maps against SAR feature spectrums. Furthermore, to stabilize the integration of SAR priors, we propose a geospatial landscape domain alignment (Geo-LDA) module that applies multi-perspective consistency regularization to align structural correspondences between SAR and optical features. Extensive experiments on three challenging benchmarks demonstrate that AGPA-CGAN surpasses state-of-the-art (SOTA) methods in both translation fidelity and structural realism.
合成孔径雷达(SAR)能够实现全天候、全天对地观测,但复杂电磁散射成像产生的散斑噪声和几何畸变严重限制了其视觉可解释性。SAR到光学图像转换(S2OIT)的出现缓解了这些挑战,但仍然受到SAR和光学域之间的数据异质性和光谱差异的阻碍,在这些领域集成辅助知识提供了可行的补救措施。此外,以往的研究依赖于像素约束的对抗学习范式,对地理空间景观信息的挖掘有限,容易产生低保真图像。为了解决这些问题,我们提出了AGPA-CGAN,这是一个具有辅助地理空间景观模式增强的条件生成对抗网络(CGAN)框架,用于高质量的S2OIT。AGPA-CGAN通过将来自场景图像对的大量SAR先验属性和地理空间结构信息整合到S2OIT过程中,逐步缩小翻译图像与参考图像之间的差距。具体而言,为了充分利用SAR图像的巨大先验,我们设计了一个辅助伪散射模式集成(APSPI)模块来提取分层子空间频率条件表示,从而帮助AGPA-CGAN捕获更多描述S2OIT的线索。特别是,我们在APSPI中引入了一种基于子空间频率分析(SSFA)的无监督子空间嵌入聚类(SEC)算法,以导出针对SAR特征谱的统计伪散射行为图。此外,为了稳定SAR先验信息的整合,我们提出了一个地理空间景观域对齐(Geo-LDA)模块,该模块应用多视角一致性正则化来对齐SAR和光学特征之间的结构对应关系。在三个具有挑战性的基准上进行的大量实验表明,AGPA-CGAN在翻译保真度和结构真实感方面都优于最先进的SOTA方法。
{"title":"Harnessing conditional generative adversarial networks for SAR-to-optical image translation via auxiliary geospatial landscape pattern-augmentation","authors":"Hongbo Liang ,&nbsp;Xuezhi Yang ,&nbsp;Xiangyu Yang ,&nbsp;Xin Jing","doi":"10.1016/j.isprsjprs.2026.01.043","DOIUrl":"10.1016/j.isprsjprs.2026.01.043","url":null,"abstract":"<div><div>Synthetic aperture radar (SAR) enables all-weather, all-day Earth observation, yet speckle noise and geometric distortions from complex electromagnetic scattering imaging severely limit its visual interpretability. SAR-to-optical image translation (S2OIT) has emerged to mitigate these challenges, but remains hindered by the data heterogeneity and spectral discrepancies between SAR and optical domains, where integrating auxiliary knowledge offers a viable remedy. What is more, previous studies rely on a pixel-wise constrained adversarial learning paradigm with limited mining of geospatial landscape information are prone to generating low-fidelity images. To tackle these issues, we propose AGPA-CGAN, a conditional generative adversarial network (CGAN) framework with auxiliary geospatial landscape pattern-augmentation for high-quality S2OIT. AGPA-CGAN progressively narrows the gap between translated and reference images by integrating ample SAR prior properties and geospatial structural information from scenario image pairs into the S2OIT process. Specifically, to fully exploit the tremendous priors of SAR images, we design an auxiliary pseudo-scattering pattern integration (APSPI) module to extract hierarchical subspace frequency conditional representations, thereby aiding AGPA-CGAN in capturing more descriptive cues for S2OIT. In particular, we introduce an unsupervised subspace embedding clustering (SEC) algorithm based on subspace frequency analysis (SSFA) within APSPI to derive statistical pseudo-scattering behavior maps against SAR feature spectrums. Furthermore, to stabilize the integration of SAR priors, we propose a geospatial landscape domain alignment (Geo-LDA) module that applies multi-perspective consistency regularization to align structural correspondences between SAR and optical features. Extensive experiments on three challenging benchmarks demonstrate that AGPA-CGAN surpasses state-of-the-art (SOTA) methods in both translation fidelity and structural realism.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 502-518"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WHU-STree: A multi-modal benchmark dataset for street tree inventory whu - tree:用于街道树木清单的多模态基准数据集
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-02-06 DOI: 10.1016/j.isprsjprs.2026.02.011
Ruifei Ding , Zhe Chen , Wen Fan , Chen Long , Huijuan Xiao , Yelu Zeng , Zhen Dong , Bisheng Yang
Street trees are vital to urban livability, providing ecological and social benefits. Establishing a detailed, accurate, and dynamically updated street tree inventory has become essential for optimizing these multifunctional assets within space-constrained urban environments. Given that traditional field surveys are time-consuming and labor-intensive, automated surveys utilizing Mobile Mapping Systems (MMS) offer a more efficient solution. However, existing MMS-acquired tree datasets are limited by small-scale scene, limited annotation, or single modality, restricting their utility for comprehensive analysis. To address these limitations, we introduce WHU-STree, a cross-city, richly annotated, and multi-modal urban street tree dataset. Collected across two distinct cities, WHU-STree integrates synchronized point clouds and high-resolution images, encompassing 21,007 annotated tree instances across 50 species and 2 morphological parameters. Leveraging the unique characteristics, WHU-STree concurrently supports over 10 tasks related to street tree inventory. We benchmark representative baselines for two key tasks—tree species classification and individual tree segmentation—based on 18 major species and an “Others” category. Extensive experiments demonstrate that while multi-modal fusion yields improvements over uni-modal baselines, it currently presents performance gaps compared to strong 3D-only methods, indicating that effective fusion remains a challenging open problem requiring further research. In particular, we identify key challenges and outline potential future works for fully exploiting WHU-STree, encompassing multi-modal fusion, multi-task collaboration, cross-domain generalization, spatial pattern learning, and Multi-modal Large Language Model for street tree asset management. The WHU-STree dataset is accessible at: https://github.com/WHU-USI3DV/WHU-STree.
行道树对城市宜居性至关重要,提供生态和社会效益。在空间有限的城市环境中,建立详细、准确、动态更新的行道树清单对于优化这些多功能资产至关重要。考虑到传统的现场调查费时费力,利用移动测绘系统(MMS)的自动化调查提供了一个更有效的解决方案。然而,现有的mms采集的树木数据集受场景规模小、标注有限或模态单一的限制,制约了其综合分析的实用性。为了解决这些限制,我们引入了whu - street,这是一个跨城市、注释丰富、多模式的城市街道树数据集。whu - street收集了两个不同城市的数据,整合了同步点云和高分辨率图像,包括50个物种和2个形态参数的21,007个带注释的树实例。利用独特的特性,whu - street同时支持与街道树木库存相关的10多项任务。我们以18个主要物种和一个“其他”类别为基准,对两项关键任务——树种分类和个体树分割——进行了代表性基线的基准测试。大量的实验表明,虽然多模态融合在单模态基线上取得了进步,但与强大的3d方法相比,它目前存在性能差距,这表明有效的融合仍然是一个具有挑战性的开放性问题,需要进一步研究。特别是,我们确定了充分利用whu - street的关键挑战并概述了潜在的未来工作,包括多模态融合、多任务协作、跨域泛化、空间模式学习和用于街道树资产管理的多模态大语言模型。whu - street数据集可访问:https://github.com/WHU-USI3DV/WHU-STree。
{"title":"WHU-STree: A multi-modal benchmark dataset for street tree inventory","authors":"Ruifei Ding ,&nbsp;Zhe Chen ,&nbsp;Wen Fan ,&nbsp;Chen Long ,&nbsp;Huijuan Xiao ,&nbsp;Yelu Zeng ,&nbsp;Zhen Dong ,&nbsp;Bisheng Yang","doi":"10.1016/j.isprsjprs.2026.02.011","DOIUrl":"10.1016/j.isprsjprs.2026.02.011","url":null,"abstract":"<div><div>Street trees are vital to urban livability, providing ecological and social benefits. Establishing a detailed, accurate, and dynamically updated street tree inventory has become essential for optimizing these multifunctional assets within space-constrained urban environments. Given that traditional field surveys are time-consuming and labor-intensive, automated surveys utilizing Mobile Mapping Systems (MMS) offer a more efficient solution. However, existing MMS-acquired tree datasets are limited by small-scale scene, limited annotation, or single modality, restricting their utility for comprehensive analysis. To address these limitations, we introduce WHU-STree, a cross-city, richly annotated, and multi-modal urban street tree dataset. Collected across two distinct cities, WHU-STree integrates synchronized point clouds and high-resolution images, encompassing 21,007 annotated tree instances across 50 species and 2 morphological parameters. Leveraging the unique characteristics, WHU-STree concurrently supports over 10 tasks related to street tree inventory. We benchmark representative baselines for two key tasks—tree species classification and individual tree segmentation—based on 18 major species and an “Others” category. Extensive experiments demonstrate that while multi-modal fusion yields improvements over uni-modal baselines, it currently presents performance gaps compared to strong 3D-only methods, indicating that effective fusion remains a challenging open problem requiring further research. In particular, we identify key challenges and outline potential future works for fully exploiting WHU-STree, encompassing multi-modal fusion, multi-task collaboration, cross-domain generalization, spatial pattern learning, and Multi-modal Large Language Model for street tree asset management. The WHU-STree dataset is accessible at: <span><span>https://github.com/WHU-USI3DV/WHU-STree</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 519-542"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146135295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EAV-DETR: Efficient Arbitrary-View oriented object detection with probabilistic guarantees for UAV imagery EAV-DETR:基于概率保证的无人机图像高效任意视图目标检测
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-02-10 DOI: 10.1016/j.isprsjprs.2026.02.009
Haoyu Zuo , Minghao Ning , Yiming Shu , Shucheng Huang , Chen Sun
Oriented object detection is critical for enhancing the visual perception of unmanned aerial vehicles (UAVs). However, existing detectors primarily designed for general aerial imagery often struggle to address the unique challenges of UAV imagery, including substantial scale variations, dense clustering, and arbitrary orientations. Furthermore, these models lack probabilistic guarantees required for safety-critical applications. To address these challenges, we propose EAV-DETR, an efficient oriented object detection transformer designed for UAV imagery. Specifically, we first propose a novel scale-adaptive center supervision (SACS) strategy that explicitly enhances the encoder’s feature representations by imposing pixel-level localization constraints with zero inference overhead. Second, we design an anisotropic decoupled rotational attention (ADRA) module, which achieves superior feature alignment for objects of arbitrary morphology by generating a non-rigid adaptive sampling field. Finally, we propose a pose-aware Mondrian conformal prediction (PA-MCP) method, which utilizes the UAV’s flight pose as a physical prior to generate prediction sets with conditional coverage guarantees, thereby providing reliable uncertainty quantification. Extensive experiments on multiple aerial imagery datasets validate the effectiveness of our model. Compared to previous state-of-the-art methods, EAV-DETR improves AP75 on CODrone by 1.76% while achieving a 52% faster inference speed (46.38 vs 30.55 FPS), and improves AP50:95 on UAV-ROD by 3.17%. Our code is available at https://github.com/zzzhak/EAV-DETR.
定向目标检测是提高无人机视觉感知能力的关键。然而,现有的探测器主要是为一般航空图像设计的,通常难以解决无人机图像的独特挑战,包括大量的尺度变化、密集的聚类和任意方向。此外,这些模型缺乏安全关键应用程序所需的概率保证。为了解决这些挑战,我们提出了EAV-DETR,一种针对无人机图像设计的高效定向目标检测转换器。具体来说,我们首先提出了一种新的尺度自适应中心监督(SACS)策略,该策略通过施加零推理开销的像素级定位约束来显式增强编码器的特征表示。其次,我们设计了一个各向异性解耦旋转注意(ADRA)模块,该模块通过生成非刚性自适应采样场来实现对任意形态目标的优越特征对齐。最后,我们提出了一种姿态感知的蒙德里安保形预测(PA-MCP)方法,该方法利用无人机的飞行姿态作为物理先验来生成具有条件覆盖保证的预测集,从而提供可靠的不确定性量化。在多个航空图像数据集上的大量实验验证了我们模型的有效性。与之前最先进的方法相比,EAV-DETR在CODrone上的AP75提高了1.76%,推理速度提高了52% (46.38 vs 30.55 FPS),在UAV-ROD上的AP50:95提高了3.17%。我们的代码可在https://github.com/zzzhak/EAV-DETR上获得。
{"title":"EAV-DETR: Efficient Arbitrary-View oriented object detection with probabilistic guarantees for UAV imagery","authors":"Haoyu Zuo ,&nbsp;Minghao Ning ,&nbsp;Yiming Shu ,&nbsp;Shucheng Huang ,&nbsp;Chen Sun","doi":"10.1016/j.isprsjprs.2026.02.009","DOIUrl":"10.1016/j.isprsjprs.2026.02.009","url":null,"abstract":"<div><div>Oriented object detection is critical for enhancing the visual perception of unmanned aerial vehicles (UAVs). However, existing detectors primarily designed for general aerial imagery often struggle to address the unique challenges of UAV imagery, including substantial scale variations, dense clustering, and arbitrary orientations. Furthermore, these models lack probabilistic guarantees required for safety-critical applications. To address these challenges, we propose EAV-DETR, an efficient oriented object detection transformer designed for UAV imagery. Specifically, we first propose a novel scale-adaptive center supervision (SACS) strategy that explicitly enhances the encoder’s feature representations by imposing pixel-level localization constraints with zero inference overhead. Second, we design an anisotropic decoupled rotational attention (ADRA) module, which achieves superior feature alignment for objects of arbitrary morphology by generating a non-rigid adaptive sampling field. Finally, we propose a pose-aware Mondrian conformal prediction (PA-MCP) method, which utilizes the UAV’s flight pose as a physical prior to generate prediction sets with conditional coverage guarantees, thereby providing reliable uncertainty quantification. Extensive experiments on multiple aerial imagery datasets validate the effectiveness of our model. Compared to previous state-of-the-art methods, EAV-DETR improves <span><math><msub><mrow><mtext>AP</mtext></mrow><mrow><mn>75</mn></mrow></msub></math></span> on CODrone by 1.76% while achieving a 52% faster inference speed (46.38 vs 30.55 FPS), and improves <span><math><msub><mrow><mtext>AP</mtext></mrow><mrow><mn>50</mn><mo>:</mo><mn>95</mn></mrow></msub></math></span> on UAV-ROD by 3.17%. Our code is available at <span><span>https://github.com/zzzhak/EAV-DETR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 575-587"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RegScorer: Learning to select the best transformation of point cloud registration RegScorer:学习选择点云注册的最佳转换
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-01-27 DOI: 10.1016/j.isprsjprs.2026.01.034
Xiaochen Yang , Haiping Wang , Yuan Liu , Bisheng Yang , Zhen Dong
We propose RegScorer, a model learning to identify the optimal transformation to register unaligned point clouds. Existing registration advancements can generate a set of candidate transformations, which are then evaluated using conventional metrics such as Inlier Ratio (IR), Mean Squared Error (MSE) or Chamfer Distance (CD). The candidate achieving the best score is selected as the final result. However, we argue that these metrics often fail to select the correct transformation, especially in challenging scenarios involving symmetric objects, repetitive structures, or low-overlap regions. This leads to significant degradation in registration performance, a problem that has long been overlooked. The core issue lies in their limited focus on local geometric consistency and inability to capture two key conflict cases of misalignment: (1) point pairs that are spatially close after alignment but have conflicting features, and (2) point pairs with high feature similarity but large spatial distances after alignment. To address this, we propose RegScorer, which models both the spatial and feature relationships of all point pairs. This allows RegScorer to learn to capture the above conflict cases and provides a more reliable score for transformation quality. On the 3DLoMatch and ScanNet datasets, RegScorer demonstrate 19.3% and 14.1% improvements in registration recall, leading to 4.7% and 5.1% accuracy gains in multiview registration. Moreover, when generalized to symmetric and low-texture outdoor scenes, RegScorer achieves a 25% increase in transformation recall over IR metric, highlighting its robustness and generalizability. The pre-trained model and the complete code repository can be accessed at https://github.com/WHU-USI3DV/RegScorer.
我们提出了RegScorer,一个模型学习来识别最优的转换,以配准不对齐的点云。现有的配准进展可以生成一组候选变换,然后使用传统的指标(如Inlier Ratio (IR)、均方误差(MSE)或倒角距离(CD))对其进行评估。成绩最好的候选人被选为最终成绩。然而,我们认为这些指标经常不能选择正确的转换,特别是在涉及对称对象、重复结构或低重叠区域的具有挑战性的场景中。这将导致注册性能的显著下降,这是一个长期被忽视的问题。核心问题在于它们对局部几何一致性的关注有限,无法捕捉到两种关键的不对齐冲突情况:(1)对齐后空间接近但特征冲突的点对;(2)对齐后特征相似度高但空间距离大的点对。为了解决这个问题,我们提出了RegScorer,它对所有点对的空间和特征关系进行建模。这允许RegScorer学习捕获上述冲突案例,并为转换质量提供更可靠的评分。在3DLoMatch和ScanNet数据集上,RegScorer的注册召回率分别提高了19.3%和14.1%,导致多视图注册的准确率分别提高了4.7%和5.1%。此外,当推广到对称和低纹理户外场景时,RegScorer的变换召回率比IR指标提高了25%,突出了其鲁棒性和泛化性。预训练的模型和完整的代码存储库可以在https://github.com/WHU-USI3DV/RegScorer上访问。
{"title":"RegScorer: Learning to select the best transformation of point cloud registration","authors":"Xiaochen Yang ,&nbsp;Haiping Wang ,&nbsp;Yuan Liu ,&nbsp;Bisheng Yang ,&nbsp;Zhen Dong","doi":"10.1016/j.isprsjprs.2026.01.034","DOIUrl":"10.1016/j.isprsjprs.2026.01.034","url":null,"abstract":"<div><div>We propose RegScorer, a model learning to identify the optimal transformation to register unaligned point clouds. Existing registration advancements can generate a set of candidate transformations, which are then evaluated using conventional metrics such as Inlier Ratio (IR), Mean Squared Error (MSE) or Chamfer Distance (CD). The candidate achieving the best score is selected as the final result. However, we argue that these metrics often fail to select the correct transformation, especially in challenging scenarios involving symmetric objects, repetitive structures, or low-overlap regions. This leads to significant degradation in registration performance, a problem that has long been overlooked. The core issue lies in their limited focus on local geometric consistency and inability to capture two key conflict cases of misalignment: (1) point pairs that are spatially close after alignment but have conflicting features, and (2) point pairs with high feature similarity but large spatial distances after alignment. To address this, we propose RegScorer, which models both the spatial and feature relationships of all point pairs. This allows RegScorer to learn to capture the above conflict cases and provides a more reliable score for transformation quality. On the 3DLoMatch and ScanNet datasets, RegScorer demonstrate <strong>19.3</strong>% and <strong>14.1</strong>% improvements in registration recall, leading to <strong>4.7</strong>% and <strong>5.1</strong>% accuracy gains in multiview registration. Moreover, when generalized to symmetric and low-texture outdoor scenes, RegScorer achieves a <strong>25</strong>% increase in transformation recall over IR metric, highlighting its robustness and generalizability. The pre-trained model and the complete code repository can be accessed at <span><span>https://github.com/WHU-USI3DV/RegScorer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 266-277"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VectorLLM: Human-like extraction of structured building contours via multimodal LLMs VectorLLM:通过多模态llm提取结构化建筑轮廓
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.025
Tao Zhang , Shiqing Wei , Shihao Chen , Wenling Yu , Muying Luo , Shunping Ji
Automatically extracting vectorized building contours from remote sensing imagery is crucial for urban planning, population estimation, and disaster assessment. Current state-of-the-art methods rely on complex multi-stage pipelines involving pixel segmentation, vectorization, and polygon refinement, which limits their scalability and real-world applicability. Inspired by the remarkable reasoning capabilities of Large Language Models (LLMs), we introduce VectorLLM, the first Multi-modal Large Language Model (MLLM) designed for regular building contour extraction from remote sensing images. Unlike existing approaches, VectorLLM performs corner-point by corner-point regression of building contours directly, mimicking human annotators’ labeling process. Our architecture consists of a vision foundation backbone, an MLP connector, and an LLM, enhanced with learnable position embeddings to improve spatial understanding capability. Through comprehensive exploration of training strategies including pretraining, supervised fine-tuning, and direct preference optimization across WHU, WHU-Mix, and CrowdAI datasets, VectorLLM outperforms the previous SOTA methods. Remarkably, VectorLLM exhibits strong zero-shot performance on unseen objects including aircraft, water bodies, and oil tanks, highlighting its potential for unified modeling of diverse remote sensing object contour extraction tasks. Overall, this work establishes a new paradigm for vector extraction in remote sensing, leveraging the topological reasoning capabilities of LLMs to achieve both high accuracy and exceptional generalization. All code and weights will be available at https://github.com/zhang-tao-whu/VectorLLM.
从遥感影像中自动提取矢量化建筑轮廓对于城市规划、人口估计和灾害评估至关重要。目前最先进的方法依赖于复杂的多阶段管道,包括像素分割,矢量化和多边形细化,这限制了它们的可扩展性和现实世界的适用性。受大型语言模型(llm)卓越推理能力的启发,我们引入了VectorLLM,这是第一个设计用于从遥感图像中提取常规建筑物轮廓的多模态大型语言模型(MLLM)。与现有方法不同,VectorLLM直接对建筑轮廓进行逐角回归,模仿人类注释者的标记过程。我们的架构包括一个视觉基础主干、一个MLP连接器和一个LLM,通过可学习的位置嵌入来增强空间理解能力。通过对训练策略的全面探索,包括预训练、监督微调和跨WHU、WHU- mix和CrowdAI数据集的直接偏好优化,VectorLLM优于以前的SOTA方法。值得注意的是,VectorLLM在飞机、水体和油箱等看不见的物体上表现出强大的零射击性能,突出了其在各种遥感物体轮廓提取任务的统一建模方面的潜力。总的来说,这项工作为遥感矢量提取建立了一个新的范例,利用llm的拓扑推理能力来实现高精度和卓越的泛化。所有代码和权重都可以在https://github.com/zhang-tao-whu/VectorLLM上获得。
{"title":"VectorLLM: Human-like extraction of structured building contours via multimodal LLMs","authors":"Tao Zhang ,&nbsp;Shiqing Wei ,&nbsp;Shihao Chen ,&nbsp;Wenling Yu ,&nbsp;Muying Luo ,&nbsp;Shunping Ji","doi":"10.1016/j.isprsjprs.2026.01.025","DOIUrl":"10.1016/j.isprsjprs.2026.01.025","url":null,"abstract":"<div><div>Automatically extracting vectorized building contours from remote sensing imagery is crucial for urban planning, population estimation, and disaster assessment. Current state-of-the-art methods rely on complex multi-stage pipelines involving pixel segmentation, vectorization, and polygon refinement, which limits their scalability and real-world applicability. Inspired by the remarkable reasoning capabilities of Large Language Models (LLMs), we introduce VectorLLM, the first Multi-modal Large Language Model (MLLM) designed for regular building contour extraction from remote sensing images. Unlike existing approaches, VectorLLM performs corner-point by corner-point regression of building contours directly, mimicking human annotators’ labeling process. Our architecture consists of a vision foundation backbone, an MLP connector, and an LLM, enhanced with learnable position embeddings to improve spatial understanding capability. Through comprehensive exploration of training strategies including pretraining, supervised fine-tuning, and direct preference optimization across WHU, WHU-Mix, and CrowdAI datasets, VectorLLM outperforms the previous SOTA methods. Remarkably, VectorLLM exhibits strong zero-shot performance on unseen objects including aircraft, water bodies, and oil tanks, highlighting its potential for unified modeling of diverse remote sensing object contour extraction tasks. Overall, this work establishes a new paradigm for vector extraction in remote sensing, leveraging the topological reasoning capabilities of LLMs to achieve both high accuracy and exceptional generalization. All code and weights will be available at <span><span>https://github.com/zhang-tao-whu/VectorLLM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 55-68"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-object tracking of vehicles and anomalous states in remote sensing videos: Joint learning of historical trajectory guidance and ID prediction 遥感视频中车辆多目标跟踪与异常状态:历史轨迹制导与ID预测的联合学习
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-01-31 DOI: 10.1016/j.isprsjprs.2026.01.038
Bin Wang , Yuan Zhou , Haigang Sui , Guorui Ma , Peng Cheng , Di Wang
Research on multi-object tracking (MOT) of vehicles based on remote sensing video data has achieved breakthrough progress. However, MOT of vehicles in complex scenarios and their anomalous states after being subjected to strong deformation interference remains a huge challenge. This is of great significance for military defense, traffic flow management, vehicle damage assessment, etc. To address this problem, this study proposes an end-to-end MOT method that integrates a joint learning paradigm of historical trajectory guidance and identity (ID) prediction, aiming to bridge the gap between vehicle detection and continuous tracking after anomalous states occurrence. The proposed network framework primarily consists of a Frame Feature Aggregation Module (FFAM) that enhances spatial consistency of objects across consecutive video frames, a Historical Tracklets Flow Encoder (HTFE) that employs Mamba blocks to guide object embedding within potential motion flows based on historical frames, and a Semantic-Consistent Clustering Module (SCM) constructed via sparse attention computation to capture global semantic information. The discriminative features extracted by these modules are fused by a Dual-branch Modulation Fusion Unit (DMFU) to maximize the performance of the model. This study also constructs a new dataset for MOT of vehicles and anomalous states in videos, termed the VAS-MOT dataset. Extensive validation experiments conducted on this dataset demonstrate that the method achieves the highest level of performance, with HOTA and MOTA reaching 68.2% and 71.5%, respectively. Additional validation on the open-source dataset IRTS-AG confirms the strong robustness of the proposed method, showing excellent performance in long-term tracking of small vehicles in infrared videos under complex scenarios, where HOTA and MOTA reached 70.9% and 91.6%, respectively. The proposed method provides valuable insights for capturing moving objects and their anomalous states, laying a foundation for further damage assessment.
基于遥感视频数据的车辆多目标跟踪(MOT)研究取得了突破性进展。然而,复杂场景下车辆的MOT及其在受到强变形干扰后的异常状态仍然是一个巨大的挑战。这对军事防御、交通流管理、车辆损伤评估等具有重要意义。为了解决这一问题,本研究提出了一种端到端的MOT方法,该方法集成了历史轨迹引导和身份(ID)预测的联合学习范式,旨在弥合异常状态发生后车辆检测与持续跟踪之间的差距。所提出的网络框架主要由帧特征聚合模块(FFAM)组成,该模块增强了对象在连续视频帧中的空间一致性;历史轨迹流编码器(HTFE)采用曼巴块来指导基于历史帧的潜在运动流中的对象嵌入;以及通过稀疏注意力计算构建的语义一致聚类模块(SCM)来捕获全局语义信息。通过双支路调制融合单元(Dual-branch Modulation Fusion Unit, DMFU)对这些模块提取的判别特征进行融合,使模型的性能最大化。本研究还构建了一个新的车辆MOT和视频异常状态数据集,称为VAS-MOT数据集。在该数据集上进行的大量验证实验表明,该方法达到了最高的性能水平,HOTA和MOTA分别达到了68.2%和71.5%。在开源数据集IRTS-AG上的进一步验证证实了该方法具有较强的鲁棒性,在复杂场景下红外视频中对小型车辆的长期跟踪表现优异,HOTA和MOTA分别达到70.9%和91.6%。该方法为捕获运动物体及其异常状态提供了有价值的见解,为进一步的损伤评估奠定了基础。
{"title":"Multi-object tracking of vehicles and anomalous states in remote sensing videos: Joint learning of historical trajectory guidance and ID prediction","authors":"Bin Wang ,&nbsp;Yuan Zhou ,&nbsp;Haigang Sui ,&nbsp;Guorui Ma ,&nbsp;Peng Cheng ,&nbsp;Di Wang","doi":"10.1016/j.isprsjprs.2026.01.038","DOIUrl":"10.1016/j.isprsjprs.2026.01.038","url":null,"abstract":"<div><div>Research on multi-object tracking (MOT) of vehicles based on remote sensing video data has achieved breakthrough progress. However, MOT of vehicles in complex scenarios and their anomalous states after being subjected to strong deformation interference remains a huge challenge. This is of great significance for military defense, traffic flow management, vehicle damage assessment, etc. To address this problem, this study proposes an end-to-end MOT method that integrates a joint learning paradigm of historical trajectory guidance and identity (ID) prediction, aiming to bridge the gap between vehicle detection and continuous tracking after anomalous states occurrence. The proposed network framework primarily consists of a Frame Feature Aggregation Module (FFAM) that enhances spatial consistency of objects across consecutive video frames, a Historical Tracklets Flow Encoder (HTFE) that employs Mamba blocks to guide object embedding within potential motion flows based on historical frames, and a Semantic-Consistent Clustering Module (SCM) constructed via sparse attention computation to capture global semantic information. The discriminative features extracted by these modules are fused by a Dual-branch Modulation Fusion Unit (DMFU) to maximize the performance of the model. This study also constructs a new dataset for MOT of vehicles and anomalous states in videos, termed the VAS-MOT dataset. Extensive validation experiments conducted on this dataset demonstrate that the method achieves the highest level of performance, with HOTA and MOTA reaching 68.2% and 71.5%, respectively. Additional validation on the open-source dataset IRTS-AG confirms the strong robustness of the proposed method, showing excellent performance in long-term tracking of small vehicles in infrared videos under complex scenarios, where HOTA and MOTA reached 70.9% and 91.6%, respectively. The proposed method provides valuable insights for capturing moving objects and their anomalous states, laying a foundation for further damage assessment.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 383-406"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel transformer-based CO2 retrieval framework incorporating prior constraint and hierarchical features injection: assessment of transferability for Tansat-2 结合先验约束和分层特征注入的基于变压器的CO2检索框架:Tansat-2可转移性评估
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-03-01 Epub Date: 2026-02-02 DOI: 10.1016/j.isprsjprs.2026.01.039
Lingfeng Zhang , Lu Zhang , Xingying Zhang , Tiantao Cheng , Xifeng Cao , Tongwen Li , Dongdong Liu , Yang Zhang , Yuhan Jiang , Ruohua Hu , Haiyang Dou , Lin Chen
Carbon dioxide (CO2), the primary contributor to global warming, significantly impacts global climate change. Remote sensing is an effective approach for monitoring atmospheric CO2 concentrations. However, the commonly used satellite’s full-physics Optimal Estimation (OE) method is time-consuming and requires advanced equipment. Additionally, traditional deep learning algorithms for satellite CO2 retrieval suffer from limitations in accuracy and an inability to extrapolate effectively to unseen high values, caused by the gradually increasing concentrations over time. Balancing both efficiency and extrapolation capabilities is a critical task, especially for the next generation of large-swath carbon satellite with a significant increase in data volumes, such as Tansat-2. In this study, we first employed the OCO-2 data from 2020 to construct a Transformer-based structure and integrate prior constraint and hierarchical features injection mechanism for high precision CO2 retrieval, and achieved an outstanding result with the R, RMSE, and MAPE of 0.939, 0.746 ppm, and 0.132 %. Based on the model, we evaluated its extrapolation capability using OCO-2 data from 2021 to 2024, demonstrating a robust performance and strong generalization ability (R = 0.938–0.951, RMSE = 1.083–1.310 ppm, MAPE = 0.208–0.256 %). Finally, we assessed the transferability of this model using simulated Tansat-2 data (August 18, 2020), achieving metrics of R = 0.657, RMSE = 1.299 ppm, and MAPE = 0.239 %, indicating the model’s effective transfer capabilities. The proposed model has the potential to provide a feasible solution for rapidly retrieving high-precision CO2, especially for the next generation of large-swath carbon satellites.
二氧化碳(CO2)是导致全球变暖的主要因素,对全球气候变化产生重大影响。遥感是监测大气CO2浓度的有效方法。然而,常用的卫星全物理最优估计(OE)方法耗时长,且需要先进的设备。此外,用于卫星CO2检索的传统深度学习算法在准确性上存在局限性,并且无法有效地推断出由于浓度随时间逐渐增加而导致的看不见的高值。平衡效率和外推能力是一项关键任务,特别是对于下一代数据量显著增加的大面积碳卫星,如Tansat-2。本研究首先利用2020年OCO-2数据,构建了基于transformer的结构,结合先验约束和分层特征注入机制,实现了CO2的高精度反演,R、RMSE和MAPE分别为0.939、0.746 ppm和0.132%,取得了较好的结果。在此基础上,我们利用2021 - 2024年OCO-2数据评估了该模型的外推能力,结果表明该模型具有较强的泛化能力(R = 0.938 ~ 0.951, RMSE = 1.083 ~ 1.310 ppm, MAPE = 0.208 ~ 0.256%)。最后,我们利用模拟的Tansat-2数据(2020年8月18日)评估了该模型的可转移性,得到R = 0.657, RMSE = 1.299 ppm, MAPE = 0.239 %的指标,表明该模型具有有效的转移能力。所提出的模型有可能为快速、高精度地获取二氧化碳提供可行的解决方案,特别是为下一代大面积碳卫星提供解决方案。
{"title":"A novel transformer-based CO2 retrieval framework incorporating prior constraint and hierarchical features injection: assessment of transferability for Tansat-2","authors":"Lingfeng Zhang ,&nbsp;Lu Zhang ,&nbsp;Xingying Zhang ,&nbsp;Tiantao Cheng ,&nbsp;Xifeng Cao ,&nbsp;Tongwen Li ,&nbsp;Dongdong Liu ,&nbsp;Yang Zhang ,&nbsp;Yuhan Jiang ,&nbsp;Ruohua Hu ,&nbsp;Haiyang Dou ,&nbsp;Lin Chen","doi":"10.1016/j.isprsjprs.2026.01.039","DOIUrl":"10.1016/j.isprsjprs.2026.01.039","url":null,"abstract":"<div><div>Carbon dioxide (CO<sub>2</sub>), the primary contributor to global warming, significantly impacts global climate change. Remote sensing is an effective approach for monitoring atmospheric CO<sub>2</sub> concentrations. However, the commonly used satellite’s full-physics Optimal Estimation (OE) method is time-consuming and requires advanced equipment. Additionally, traditional deep learning algorithms for satellite CO<sub>2</sub> retrieval suffer from limitations in accuracy and an inability to extrapolate effectively to unseen high values, caused by the gradually increasing concentrations over time. Balancing both efficiency and extrapolation capabilities is a critical task, especially for the next generation of large-swath carbon satellite with a significant increase in data volumes, such as Tansat-2. In this study, we first employed the OCO-2 data from 2020 to construct a Transformer-based structure and integrate prior constraint and hierarchical features injection mechanism for high precision CO<sub>2</sub> retrieval, and achieved an outstanding result with the R, RMSE, and MAPE of 0.939, 0.746 ppm, and 0.132 %. Based on the model, we evaluated its extrapolation capability using OCO-2 data from 2021 to 2024, demonstrating a robust performance and strong generalization ability (R = 0.938–0.951, RMSE = 1.083–1.310 ppm, MAPE = 0.208–0.256 %). Finally, we assessed the transferability of this model using simulated Tansat-2 data (August 18, 2020), achieving metrics of R = 0.657, RMSE = 1.299 ppm, and MAPE = 0.239 %, indicating the model’s effective transfer capabilities. The proposed model has the potential to provide a feasible solution for rapidly retrieving high-precision CO<sub>2</sub>, especially for the next generation of large-swath carbon satellites.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 423-436"},"PeriodicalIF":12.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ISPRS Journal of Photogrammetry and Remote Sensing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1