Pub Date : 2024-11-16DOI: 10.1016/j.isprsjprs.2024.11.004
Xiang Fang , Yifan Lu , Shihua Zhang , Yining Xie , Jiayi Ma
Two-view correspondence learning plays a pivotal role in the field of computer vision. However, this task is beset with great challenges stemming from the significant imbalance between true and false correspondences. Recent approaches have started leveraging the inherent filtering properties of convolution to eliminate false matches. Nevertheless, these methods tend to apply convolution in an ad hoc manner without careful design, thereby inheriting the limitations of convolution and hindering performance improvement. In this paper, we propose a novel convolution-based method called ACMatch, which aims to meticulously design convolutional filters to mitigate the shortcomings of convolution and enhance its effectiveness. Specifically, to address the limitation of existing convolutional filters of struggling to effectively capture global information due to the limited receptive field, we introduce a strategy to help them obtain relatively global information by guiding grid points to incorporate more contextual information, thus enabling a global perspective for two-view learning. Furthermore, we recognize that in the context of feature matching, inliers and outliers provide fundamentally different information. Hence, we design an adaptive weighted convolution module that allows the filters to focus more on inliers while ignoring outliers. Extensive experiments across various visual tasks demonstrate the effectiveness, superiority, and generalization. Notably, ACMatch attains an AUC@5° of 35.93% on YFCC100M without RANSAC, surpassing the previous state-of-the-art by 5.85 absolute percentage points and exceeding the 35% AUC@5° bar for the first time. Our code is publicly available at https://github.com/ShineFox/ACMatch.
{"title":"ACMatch: Improving context capture for two-view correspondence learning via adaptive convolution","authors":"Xiang Fang , Yifan Lu , Shihua Zhang , Yining Xie , Jiayi Ma","doi":"10.1016/j.isprsjprs.2024.11.004","DOIUrl":"10.1016/j.isprsjprs.2024.11.004","url":null,"abstract":"<div><div>Two-view correspondence learning plays a pivotal role in the field of computer vision. However, this task is beset with great challenges stemming from the significant imbalance between true and false correspondences. Recent approaches have started leveraging the inherent filtering properties of convolution to eliminate false matches. Nevertheless, these methods tend to apply convolution in an ad hoc manner without careful design, thereby inheriting the limitations of convolution and hindering performance improvement. In this paper, we propose a novel convolution-based method called ACMatch, which aims to meticulously design convolutional filters to mitigate the shortcomings of convolution and enhance its effectiveness. Specifically, to address the limitation of existing convolutional filters of struggling to effectively capture global information due to the limited receptive field, we introduce a strategy to help them obtain relatively global information by guiding grid points to incorporate more contextual information, thus enabling a global perspective for two-view learning. Furthermore, we recognize that in the context of feature matching, inliers and outliers provide fundamentally different information. Hence, we design an adaptive weighted convolution module that allows the filters to focus more on inliers while ignoring outliers. Extensive experiments across various visual tasks demonstrate the effectiveness, superiority, and generalization. Notably, ACMatch attains an AUC@5° of 35.93% on YFCC100M without RANSAC, surpassing the previous state-of-the-art by 5.85 absolute percentage points and exceeding the 35% AUC@5° bar for the first time. Our code is publicly available at <span><span>https://github.com/ShineFox/ACMatch</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 466-480"},"PeriodicalIF":10.6,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.isprsjprs.2024.11.006
Ruilong Wei , Yamei Li , Yao Li , Bo Zhang , Jiao Wang , Chunhao Wu , Shunyu Yao , Chengming Ye
Efficient landslide mapping is crucial for disaster mitigation and relief. Recently, deep learning methods have shown promising results in landslide mapping using satellite imagery. However, the sample sparsity and geographic diversity of landslides have challenged the transferability of deep learning models. In this paper, we proposed a universal adapter module that can be seamlessly embedded into existing segmentation models for transferable landslide mapping. The adapter can achieve high-accuracy cross-regional landslide segmentation with a small sample set, requiring minimal parameter adjustments. In detail, the pre-trained baseline model freezes its parameters to keep learned knowledge of the source domain, while the lightweight adapter fine-tunes only a few parameters to learn new landslide features of the target domain. Structurally, we introduced an attention mechanism to enhance the feature extraction of the adapter. To validate the proposed adapter module, 4321 landslide samples were prepared, and the Segment Anything Model (SAM) and other baseline models, along with four transfer strategies were selected for controlled experiments. In addition, Sentinel-2 satellite imagery in the Himalayas and Hengduan Mountains, located on the southern and southeastern edges of the Tibetan Plateau was collected for evaluation. The controlled experiments reported that SAM, when combined with our adapter module, achieved a peak mean Intersection over Union (mIoU) of 82.3 %. For other baseline models, integrating the adapter improved mIoU by 2.6 % to 12.9 % compared with traditional strategies on cross-regional landslide mapping. In particular, baseline models with Transformers are more suitable for fine-tuning parameters. Furthermore, the visualized feature maps revealed that fine-tuning shallow encoders can achieve better effects in model transfer. Besides, the proposed adapter can effectively extract landslide features and focus on specific spatial and channel domains with significant features. We also quantified the spectral, scale, and shape features of landslides and analyzed their impacts on segmentation results. Our analysis indicated that weak spectral differences, as well as extreme scale and edge shapes are detrimental to the accuracy of landslide segmentation. Overall, this adapter module provides a new perspective for large-scale transferable landslide mapping.
{"title":"A universal adapter in segmentation models for transferable landslide mapping","authors":"Ruilong Wei , Yamei Li , Yao Li , Bo Zhang , Jiao Wang , Chunhao Wu , Shunyu Yao , Chengming Ye","doi":"10.1016/j.isprsjprs.2024.11.006","DOIUrl":"10.1016/j.isprsjprs.2024.11.006","url":null,"abstract":"<div><div>Efficient landslide mapping is crucial for disaster mitigation and relief. Recently, deep learning methods have shown promising results in landslide mapping using satellite imagery. However, the sample sparsity and geographic diversity of landslides have challenged the transferability of deep learning models. In this paper, we proposed a universal adapter module that can be seamlessly embedded into existing segmentation models for transferable landslide mapping. The adapter can achieve high-accuracy cross-regional landslide segmentation with a small sample set, requiring minimal parameter adjustments. In detail, the pre-trained baseline model freezes its parameters to keep learned knowledge of the source domain, while the lightweight adapter fine-tunes only a few parameters to learn new landslide features of the target domain. Structurally, we introduced an attention mechanism to enhance the feature extraction of the adapter. To validate the proposed adapter module, 4321 landslide samples were prepared, and the Segment Anything Model (SAM) and other baseline models, along with four transfer strategies were selected for controlled experiments. In addition, Sentinel-2 satellite imagery in the Himalayas and Hengduan Mountains, located on the southern and southeastern edges of the Tibetan Plateau was collected for evaluation. The controlled experiments reported that SAM, when combined with our adapter module, achieved a peak mean Intersection over Union (mIoU) of 82.3 %. For other baseline models, integrating the adapter improved mIoU by 2.6 % to 12.9 % compared with traditional strategies on cross-regional landslide mapping. In particular, baseline models with Transformers are more suitable for fine-tuning parameters. Furthermore, the visualized feature maps revealed that fine-tuning shallow encoders can achieve better effects in model transfer. Besides, the proposed adapter can effectively extract landslide features and focus on specific spatial and channel domains with significant features. We also quantified the spectral, scale, and shape features of landslides and analyzed their impacts on segmentation results. Our analysis indicated that weak spectral differences, as well as extreme scale and edge shapes are detrimental to the accuracy of landslide segmentation. Overall, this adapter module provides a new perspective for large-scale transferable landslide mapping.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 446-465"},"PeriodicalIF":10.6,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.isprsjprs.2024.11.003
Yangtian Fang , Rui Liu , Yini Peng , Jianjun Guan , Duidui Li , Xin Tian
The use of synthetic aperture radar (SAR) has greatly improved our ability to capture high-resolution terrestrial images under various weather conditions. However, SAR imagery is affected by speckle noise, which distorts image details and hampers subsequent applications. Recent forays into supervised deep learning-based denoising methods, like MRDDANet and SAR-CAM, offer a promising avenue for SAR despeckling. However, they are impeded by the domain gaps between synthetic data and realistic SAR images. To tackle this problem, we introduce a self-supervised speckle-aware network to utilize the limited near-real datasets and unlimited synthetic datasets simultaneously, which boosts the performance of the downstream despeckling module by teaching the module to discriminate the domain gap of different datasets in the embedding space. Specifically, based on contrastive learning, the speckle-aware network first characterizes the discriminative representations of spatial-correlated speckle noise in different images across diverse datasets, which provides priors of versatile speckles and image characteristics. Then, the representations are effectively modulated into a subsequent multi-scale despeckling network to generate authentic despeckled images. In this way, the despeckling module can reconstruct reliable SAR image characteristics by learning from near-real datasets, while the generalization performance is guaranteed by learning abundant patterns from synthetic datasets simultaneously. Additionally, a novel excitation aggregation pooling module is inserted into the despeckling network to enhance the network further, which utilizes features from different levels of scales and better preserves spatial details around strong scatters in real SAR images. Extensive experiments across real SAR datasets from Sentinel-1, Capella-X, and TerraSAR-X satellites are carried out to verify the effectiveness of the proposed method over other state-of-the-art methods. Specifically, the proposed method achieves the best PSNR and SSIM values evaluated on the near-real Sentinel-1 dataset, with gains of 0.22 dB in PSNR compared to MRDDANet, and improvements of 1.3% in SSIM over SAR-CAM. The code is available at https://github.com/YangtianFang2002/CL-SAR-Despeckling.
{"title":"Contrastive learning for real SAR image despeckling","authors":"Yangtian Fang , Rui Liu , Yini Peng , Jianjun Guan , Duidui Li , Xin Tian","doi":"10.1016/j.isprsjprs.2024.11.003","DOIUrl":"10.1016/j.isprsjprs.2024.11.003","url":null,"abstract":"<div><div>The use of synthetic aperture radar (SAR) has greatly improved our ability to capture high-resolution terrestrial images under various weather conditions. However, SAR imagery is affected by speckle noise, which distorts image details and hampers subsequent applications. Recent forays into supervised deep learning-based denoising methods, like MRDDANet and SAR-CAM, offer a promising avenue for SAR despeckling. However, they are impeded by the domain gaps between synthetic data and realistic SAR images. To tackle this problem, we introduce a self-supervised speckle-aware network to utilize the limited near-real datasets and unlimited synthetic datasets simultaneously, which boosts the performance of the downstream despeckling module by teaching the module to discriminate the domain gap of different datasets in the embedding space. Specifically, based on contrastive learning, the speckle-aware network first characterizes the discriminative representations of spatial-correlated speckle noise in different images across diverse datasets, which provides priors of versatile speckles and image characteristics. Then, the representations are effectively modulated into a subsequent multi-scale despeckling network to generate authentic despeckled images. In this way, the despeckling module can reconstruct reliable SAR image characteristics by learning from near-real datasets, while the generalization performance is guaranteed by learning abundant patterns from synthetic datasets simultaneously. Additionally, a novel excitation aggregation pooling module is inserted into the despeckling network to enhance the network further, which utilizes features from different levels of scales and better preserves spatial details around strong scatters in real SAR images. Extensive experiments across real SAR datasets from Sentinel-1, Capella-X, and TerraSAR-X satellites are carried out to verify the effectiveness of the proposed method over other state-of-the-art methods. Specifically, the proposed method achieves the best PSNR and SSIM values evaluated on the near-real Sentinel-1 dataset, with gains of 0.22 dB in PSNR compared to MRDDANet, and improvements of 1.3% in SSIM over SAR-CAM. The code is available at <span><span>https://github.com/YangtianFang2002/CL-SAR-Despeckling</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 376-391"},"PeriodicalIF":10.6,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.isprsjprs.2024.10.009
Zhixin Duan , Liang Cheng , Qingzhou Mao , Yueting Song , Xiao Zhou , Manchun Li , Jianya Gong
Satellite-derived bathymetry (SDB) is a vital technique for the rapid and cost-effective measurement of shallow underwater terrain. However, it faces challenges of image noise, including clouds, bubble clouds, and sun glint. Consequently, the acquisition of no missing and accurate bathymetric maps is frequently challenging, particularly in cloudy, rainy, and large-scale regions. In this study, we propose a multi-temporal image weighted composition (MIWC) method. This method performs iterative segmentation and inverse distance weighted composition of multi-temporal images based only on the near-infrared (NIR) band information of multispectral images to obtain high-quality composite images. The method was applied to scenarios using Sentinel-2 imagery for bathymetry of four representative areas located in the South China Sea and the Atlantic Ocean. The results show that the root mean square error (RMSE) of bathymetry from the composite images using the log-transformed linear model (LLM) and the log-transformed ratio model (LRM) in the water depth range of 0–20 m are 0.67–1.22 m and 0.71–1.23 m, respectively. The RMSE of the bathymetry decreases with the number of images involved in the composition and tends to be relatively stable when the number of images reaches approximately 16. In addition, the composition images generated by the MIWC method generally exhibit not only superior visual quality, but also significant advantages in terms of bathymetric accuracy and robustness when compared to the best single images as well as the composition images generated by the median composition method and the maximum outlier removal method. The recommended value of the power parameter for inverse distance weighting in the MIWC method was experimentally determined to be 4, which typically does not require complex adjustments, making the method easy to apply or integrate. The MIWC method offers a reliable approach to improve the quality of remote sensing images, ensuring the completeness and accuracy of shallow water bathymetric maps.
{"title":"MIWC: A multi-temporal image weighted composition method for satellite-derived bathymetry in shallow waters","authors":"Zhixin Duan , Liang Cheng , Qingzhou Mao , Yueting Song , Xiao Zhou , Manchun Li , Jianya Gong","doi":"10.1016/j.isprsjprs.2024.10.009","DOIUrl":"10.1016/j.isprsjprs.2024.10.009","url":null,"abstract":"<div><div>Satellite-derived bathymetry (SDB) is a vital technique for the rapid and cost-effective measurement of shallow underwater terrain. However, it faces challenges of image noise, including clouds, bubble clouds, and sun glint. Consequently, the acquisition of no missing and accurate bathymetric maps is frequently challenging, particularly in cloudy, rainy, and large-scale regions. In this study, we propose a multi-temporal image weighted composition (MIWC) method. This method performs iterative segmentation and inverse distance weighted composition of multi-temporal images based only on the near-infrared (NIR) band information of multispectral images to obtain high-quality composite images. The method was applied to scenarios using Sentinel-2 imagery for bathymetry of four representative areas located in the South China Sea and the Atlantic Ocean. The results show that the root mean square error (RMSE) of bathymetry from the composite images using the log-transformed linear model (LLM) and the log-transformed ratio model (LRM) in the water depth range of 0–20 m are 0.67–1.22 m and 0.71–1.23 m, respectively. The RMSE of the bathymetry decreases with the number of images involved in the composition and tends to be relatively stable when the number of images reaches approximately 16. In addition, the composition images generated by the MIWC method generally exhibit not only superior visual quality, but also significant advantages in terms of bathymetric accuracy and robustness when compared to the best single images as well as the composition images generated by the median composition method and the maximum outlier removal method. The recommended value of the power parameter for inverse distance weighting in the MIWC method was experimentally determined to be 4, which typically does not require complex adjustments, making the method easy to apply or integrate. The MIWC method offers a reliable approach to improve the quality of remote sensing images, ensuring the completeness and accuracy of shallow water bathymetric maps.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 430-445"},"PeriodicalIF":10.6,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.isprsjprs.2024.10.025
Xinlong Li , Mingtao Ding , Zhenhong Li , Peng Cui
Automatic co-registration of multi-epoch Unmanned Aerial Vehicle (UAV) image sets remains challenging due to the radiometric differences in complex dynamic scenes. Specifically, illumination changes and vegetation variations usually lead to insufficient and spatially unevenly distributed common tie points (CTPs), resulting in under-fitting of co-registration near the areas without CTPs. In this paper, we propose a novel Common-Feature-Track-Matching (CFTM) approach for UAV image sets co-registration, to alleviate the shortage of CTPs in complex dynamic scenes. Instead of matching features between multi-epoch images, we first search correspondences between multi-epoch feature tracks (i.e., groups of features corresponding to the same 3D points), which avoids the removal of matches due to unreliable estimation of the relative pose between inter-epoch image pairs. Then, the CTPs are triangulated from the successfully matched track pairs. Since an even distribution of CTPs is crucial for robust co-registration, a block-based strategy is designed, as well as enabling parallel computation. Finally, an iterative optimization algorithm is developed to gradually select the best CTPs to refine the poses of multi-epoch images. We assess the performance of our method on two challenging datasets. The results show that CFTM can automatically acquire adequate and evenly distributed CTPs in complex dynamic scenes, achieving a high co-registration accuracy approximately four times higher than the state-of-the-art in challenging scenario. Our code is available at https://github.com/lixinlong1998/CoSfM.
{"title":"Common-feature-track-matching approach for multi-epoch UAV photogrammetry co-registration","authors":"Xinlong Li , Mingtao Ding , Zhenhong Li , Peng Cui","doi":"10.1016/j.isprsjprs.2024.10.025","DOIUrl":"10.1016/j.isprsjprs.2024.10.025","url":null,"abstract":"<div><div>Automatic co-registration of multi-epoch Unmanned Aerial Vehicle (UAV) image sets remains challenging due to the radiometric differences in complex dynamic scenes. Specifically, illumination changes and vegetation variations usually lead to insufficient and spatially unevenly distributed common tie points (CTPs), resulting in under-fitting of co-registration near the areas without CTPs. In this paper, we propose a novel Common-Feature-Track-Matching (CFTM) approach for UAV image sets co-registration, to alleviate the shortage of CTPs in complex dynamic scenes. Instead of matching features between multi-epoch images, we first search correspondences between multi-epoch feature tracks (i.e., groups of features corresponding to the same 3D points), which avoids the removal of matches due to unreliable estimation of the relative pose between inter-epoch image pairs. Then, the CTPs are triangulated from the successfully matched track pairs. Since an even distribution of CTPs is crucial for robust co-registration, a block-based strategy is designed, as well as enabling parallel computation. Finally, an iterative optimization algorithm is developed to gradually select the best CTPs to refine the poses of multi-epoch images. We assess the performance of our method on two challenging datasets. The results show that CFTM can automatically acquire adequate and evenly distributed CTPs in complex dynamic scenes, achieving a high co-registration accuracy approximately four times higher than the state-of-the-art in challenging scenario. Our code is available at <span><span>https://github.com/lixinlong1998/CoSfM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 392-407"},"PeriodicalIF":10.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.isprsjprs.2024.10.021
Peng Chen , Peixian Li , Bing Wang , Sihai Zhao , Yongliang Zhang , Tao Zhang , Xingcheng Ding
Building change detection (CD) plays a crucial role in urban planning, land resource management, and disaster monitoring. Currently, deep learning has become a key approach in building CD, but challenges persist. Obtaining large-scale, accurately registered bi-temporal images is difficult, and annotation is time-consuming. Therefore, we propose B3-CDG, a bi-temporal building binary CD pseudo-sample generator based on the principle of latent diffusion. This generator treats building change processes as local semantic states transformations. It utilizes textual instructions and mask prompts to generate specific class changes in designated regions of single-temporal images, creating different temporal images with clear semantic transitions. B3-CDG is driven by large-scale pretrained models and utilizes external adapters to guide the model in learning remote sensing image distributions. To generate seamless building boundaries, B3-CDG adopts a simple and effective approach—dilation masks—to compel the model to learn boundary details. In addition, B3-CDG incorporates diffusion guidance and data augmentation to enhance image realism. In the generation experiments, B3-CDG achieved the best performance with the lowest FID (26.40) and the highest IS (4.60) compared to previous baseline methods (such as Inpaint and IAug). This method effectively addresses challenges such as boundary continuity, shadow generation, and vegetation occlusion while ensuring that the generated building roof structures and colors are realistic and diverse. In the application experiments, B3-CDG improved the IOU of the validation model (SFFNet) by 6.34 % and 7.10 % on the LEVIR and WHUCD datasets, respectively. When the real data is extremely limited (using only 5 % of the original data), the improvement further reaches 33.68 % and 32.40 %. Moreover, B3-CDG can enhance the baseline performance of advanced CD models, such as SNUNet and ChangeFormer. Ablation studies further confirm the effectiveness of the B3-CDG design. This study introduces a novel research paradigm for building CD, potentially advancing the field. Source code and datasets will be available at https://github.com/ABCnutter/B3-CDG.
{"title":"B3-CDG: A pseudo-sample diffusion generator for bi-temporal building binary change detection","authors":"Peng Chen , Peixian Li , Bing Wang , Sihai Zhao , Yongliang Zhang , Tao Zhang , Xingcheng Ding","doi":"10.1016/j.isprsjprs.2024.10.021","DOIUrl":"10.1016/j.isprsjprs.2024.10.021","url":null,"abstract":"<div><div>Building change detection (CD) plays a crucial role in urban planning, land resource management, and disaster monitoring. Currently, deep learning has become a key approach in building CD, but challenges persist. Obtaining large-scale, accurately registered bi-temporal images is difficult, and annotation is time-consuming. Therefore, we propose B<sup>3</sup>-CDG, a bi-temporal building binary CD pseudo-sample generator based on the principle of latent diffusion. This generator treats building change processes as local semantic states transformations. It utilizes textual instructions and mask prompts to generate specific class changes in designated regions of single-temporal images, creating different temporal images with clear semantic transitions. B<sup>3</sup>-CDG is driven by large-scale pretrained models and utilizes external adapters to guide the model in learning remote sensing image distributions. To generate seamless building boundaries, B<sup>3</sup>-CDG adopts a simple and effective approach—dilation masks—to compel the model to learn boundary details. In addition, B<sup>3</sup>-CDG incorporates diffusion guidance and data augmentation to enhance image realism. In the generation experiments, B<sup>3</sup>-CDG achieved the best performance with the lowest FID (26.40) and the highest IS (4.60) compared to previous baseline methods (such as Inpaint and IAug). This method effectively addresses challenges such as boundary continuity, shadow generation, and vegetation occlusion while ensuring that the generated building roof structures and colors are realistic and diverse. In the application experiments, B<sup>3</sup>-CDG improved the IOU of the validation model (SFFNet) by 6.34 % and 7.10 % on the LEVIR and WHUCD datasets, respectively. When the real data is extremely limited (using only 5 % of the original data), the improvement further reaches 33.68 % and 32.40 %. Moreover, B<sup>3</sup>-CDG can enhance the baseline performance of advanced CD models, such as SNUNet and ChangeFormer. Ablation studies further confirm the effectiveness of the B<sup>3</sup>-CDG design. This study introduces a novel research paradigm for building CD, potentially advancing the field. Source code and datasets will be available at <span><span>https://github.com/ABCnutter/B3-CDG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 408-429"},"PeriodicalIF":10.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.isprsjprs.2024.10.023
Jianchen Liu, Shuang Han, Jin Li
3D reconstruction is an important part of digital city, high-accuracy 3D modeling method has been widely studied as an important pathway to visualizing 3D city scenes. However, the problems of image resolution, noise, and occlusion result in low quality and smooth features in the mesh model. Therefore, the model needs to be refined to improve the mesh quality and enhance the visual effect. This paper proposes a mesh refinement algorithm to fine-tune the vertices of the mesh and constrain their evolution direction on the normal vector, reducing their freedom degrees to one. The evolution of vertices only involves one motion distance parameter on the normal vector, simplifying the complexity of the energy function derivation. Meanwhile, Gaussian curvature is used as a regularization term, which is anisotropic and preserves the edge features during the reconstruction process. The mesh refinement algorithm with unary operations fully utilizes the original image information and effectively enriches the local detail features of the mesh model. This paper utilizes five public datasets to conduct comparative experiments, and the experimental results show that the proposed algorithm can better restore the detailed features of the model and has a better refinement effect in the same number of iterations compared with OpenMVS library refinement algorithm. At the same time, in the comparison of refinement results with fewer iterations, the algorithm in this paper can achieve more desirable results.
{"title":"Mesh refinement method for multi-view stereo with unary operations","authors":"Jianchen Liu, Shuang Han, Jin Li","doi":"10.1016/j.isprsjprs.2024.10.023","DOIUrl":"10.1016/j.isprsjprs.2024.10.023","url":null,"abstract":"<div><div>3D reconstruction is an important part of digital city, high-accuracy 3D modeling method has been widely studied as an important pathway to visualizing 3D city scenes. However, the problems of image resolution, noise, and occlusion result in low quality and smooth features in the mesh model. Therefore, the model needs to be refined to improve the mesh quality and enhance the visual effect. This paper proposes a mesh refinement algorithm to fine-tune the vertices of the mesh and constrain their evolution direction on the normal vector, reducing their freedom degrees to one. The evolution of vertices only involves one motion distance parameter on the normal vector, simplifying the complexity of the energy function derivation. Meanwhile, Gaussian curvature is used as a regularization term, which is anisotropic and preserves the edge features during the reconstruction process. The mesh refinement algorithm with unary operations fully utilizes the original image information and effectively enriches the local detail features of the mesh model. This paper utilizes five public datasets to conduct comparative experiments, and the experimental results show that the proposed algorithm can better restore the detailed features of the model and has a better refinement effect in the same number of iterations compared with OpenMVS library refinement algorithm. At the same time, in the comparison of refinement results with fewer iterations, the algorithm in this paper can achieve more desirable results.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 361-375"},"PeriodicalIF":10.6,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11DOI: 10.1016/j.isprsjprs.2024.10.031
Shaokun Guo , Jie Dong , Yian Wang , Mingsheng Liao
Geocoding is the procedure of finding the mapping between the Synthetic Aperture Radar (SAR) image and the imaged scene. The inverse form of the Range-Doppler (RD) model has been adopted to approximate the geocoding results. However, with advances in SAR imaging geodesy, its imprecise nature becomes more perceptible. The forward RD model gives reliable solutions but is time-consuming and unable to detect geometric distortions. This study proposes a highly optimized forward geocoding method to find the precise ground position of each image sample with a Digital Elevation Model (DEM). By following the intersection of the terrain and the so-called solution surface of an azimuth line, which can be locally approximated by a plane, it produces geo-location results almost identical to the analytical solutions of the RD model. At the same time, the non-unique geocoding solutions and the geometric distortions are determined. Deviations from the employed approximations are assessed, showing that they are highly predictable and lead to negligible range/azimuth residuals. The general robustness is verified by experiments on SAR images of different resolutions covering diversified terrains in the native or zero Doppler geometry. Comparisons with other forward algorithms demonstrate that, with extra geometric distortions detection ability, its accuracy and efficiency are comparable to them. For a Sentinel-1 IW burst of high topographic relief, the algorithm ends in a 3 s using 16 parallel cores, with an average residual smaller than one millimeter. Its impressive blend of efficiency, accuracy, and geometric distortion detection capabilities makes it ideal for large-scale remote sensing applications.
{"title":"Fast and accurate SAR geocoding with a plane approximation","authors":"Shaokun Guo , Jie Dong , Yian Wang , Mingsheng Liao","doi":"10.1016/j.isprsjprs.2024.10.031","DOIUrl":"10.1016/j.isprsjprs.2024.10.031","url":null,"abstract":"<div><div>Geocoding is the procedure of finding the mapping between the Synthetic Aperture Radar (SAR) image and the imaged scene. The inverse form of the Range-Doppler (RD) model has been adopted to approximate the geocoding results. However, with advances in SAR imaging geodesy, its imprecise nature becomes more perceptible. The forward RD model gives reliable solutions but is time-consuming and unable to detect geometric distortions. This study proposes a highly optimized forward geocoding method to find the precise ground position of each image sample with a Digital Elevation Model (DEM). By following the intersection of the terrain and the so-called solution surface of an azimuth line, which can be locally approximated by a plane, it produces geo-location results almost identical to the analytical solutions of the RD model. At the same time, the non-unique geocoding solutions and the geometric distortions are determined. Deviations from the employed approximations are assessed, showing that they are highly predictable and lead to negligible range/azimuth residuals. The general robustness is verified by experiments on SAR images of different resolutions covering diversified terrains in the native or zero Doppler geometry. Comparisons with other forward algorithms demonstrate that, with extra geometric distortions detection ability, its accuracy and efficiency are comparable to them. For a Sentinel-1 IW burst of high topographic relief, the algorithm ends in a 3 s using 16 parallel cores, with an average residual smaller than one millimeter. Its impressive blend of efficiency, accuracy, and geometric distortion detection capabilities makes it ideal for large-scale remote sensing applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 344-360"},"PeriodicalIF":10.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.isprsjprs.2024.10.024
Przemysław Dąbek, Jacek Wodecki, Paulina Kujawa, Adam Wróblewski, Arkadiusz Macek, Radosław Zimroz
Mine excavation systems are usually dozens of kilometers long with varying geometry on a small scale (roughness and shape of the walls) and on a large scale (varying widths of the tunnels, turns, and crossings). In this article, the authors address the problem of analyzing laser scanning data from large mining structures that can be used for various purposes, with focus on ventilation simulations. Together with the quality of the measurement data (diverse point-cloud density, missing samples, holes induced by obstructions in the field of view, measurement noise), this creates problems that require multi-stage processing of the obtained data. The authors propose a robust methodology to process a single segmented section of the mining system. The presented approach focuses on obtaining a point cloud ready for application in the computational fluid dynamics (CFD) analysis of airflow with minimal need for additional manual corrections on the generated mesh model. This requires the point cloud to have evenly distributed points and reduced noise (together with removal of objects inside) while keeping the unique geometrical properties and shape of the scanned tunnels. Proposed methodology uses trajectory of the excavation either obtained during the measurements or by skeletonization process explained in the article. Cross-sections obtained on planes perpendicular to the trajectory are processed towards the equalization of point distribution, removing measurement noise, holes in the point cloud and objects inside the excavation. The effects of the proposed algorithm are validated by comparing the processed cloud with the original cloud and testing within the CFD environment. The algorithm proved high effectiveness in improving skewness rate of the obtained mesh and geometry mapping accuracy (standard deviation below 5 centimeters in cloud-to-mesh comparison).
{"title":"3D point cloud regularization method for uniform mesh generation of mining excavations","authors":"Przemysław Dąbek, Jacek Wodecki, Paulina Kujawa, Adam Wróblewski, Arkadiusz Macek, Radosław Zimroz","doi":"10.1016/j.isprsjprs.2024.10.024","DOIUrl":"10.1016/j.isprsjprs.2024.10.024","url":null,"abstract":"<div><div>Mine excavation systems are usually dozens of kilometers long with varying geometry on a small scale (roughness and shape of the walls) and on a large scale (varying widths of the tunnels, turns, and crossings). In this article, the authors address the problem of analyzing laser scanning data from large mining structures that can be used for various purposes, with focus on ventilation simulations. Together with the quality of the measurement data (diverse point-cloud density, missing samples, holes induced by obstructions in the field of view, measurement noise), this creates problems that require multi-stage processing of the obtained data. The authors propose a robust methodology to process a single segmented section of the mining system. The presented approach focuses on obtaining a point cloud ready for application in the computational fluid dynamics (CFD) analysis of airflow with minimal need for additional manual corrections on the generated mesh model. This requires the point cloud to have evenly distributed points and reduced noise (together with removal of objects inside) while keeping the unique geometrical properties and shape of the scanned tunnels. Proposed methodology uses trajectory of the excavation either obtained during the measurements or by skeletonization process explained in the article. Cross-sections obtained on planes perpendicular to the trajectory are processed towards the equalization of point distribution, removing measurement noise, holes in the point cloud and objects inside the excavation. The effects of the proposed algorithm are validated by comparing the processed cloud with the original cloud and testing within the CFD environment. The algorithm proved high effectiveness in improving skewness rate of the obtained mesh and geometry mapping accuracy (standard deviation below 5 centimeters in cloud-to-mesh comparison).</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 324-343"},"PeriodicalIF":10.6,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.isprsjprs.2024.10.030
Andrea Pulella , Francescopaolo Sica , Carlos Villamil Lopez , Harald Anglberger , Ronny Hänsch
Automatic Target Recognition (ATR) from Synthetic Aperture Radar (SAR) data covers a wide range of applications. SAR ATR helps to detect and track vehicles and other objects, e.g. in disaster relief and surveillance operations. Aircraft classification covers a significant part of this research area, which differs from other SAR-based ATR tasks, such as ship and ground vehicle detection and classification, in that aircrafts are usually a static target, often remaining at the same location and in a given orientation for longer time frames. Today, there is a significant mismatch between the abundance of deep learning-based aircraft classification models and the availability of corresponding datasets. This mismatch has led to models with improved classification performance on specific datasets, but the challenge of generalizing to conditions not present in the training data (which are expected to occur in operational conditions) has not yet been satisfactorily analyzed. This paper aims to evaluate how classification performance and generalization capabilities of deep learning models are influenced by the diversity of the training dataset. Our goal is to understand the model’s competence and the conditions under which it can achieve proficiency in aircraft classification tasks for high-resolution SAR images while demonstrating generalization capabilities when confronted with novel data that include different geographic locations, environmental conditions, and geometric variations. We address this gap by using manually annotated high-resolution SAR data from TerraSAR-X and TanDEM-X and show how the classification performance changes for different application scenarios requiring different training and evaluation setups. We find that, as expected, the type of aircraft plays a crucial role in the classification problem, since it will vary in shape and dimension. However, these aspects are secondary to how the SAR image is acquired, with the acquisition geometry playing the primary role. Therefore, we find that the characteristics of the acquisition are much more relevant for generalization than the complex geometry of the target. We show this for various models selected among the standard classification algorithms.
{"title":"Generalization in deep learning-based aircraft classification for SAR imagery","authors":"Andrea Pulella , Francescopaolo Sica , Carlos Villamil Lopez , Harald Anglberger , Ronny Hänsch","doi":"10.1016/j.isprsjprs.2024.10.030","DOIUrl":"10.1016/j.isprsjprs.2024.10.030","url":null,"abstract":"<div><div>Automatic Target Recognition (ATR) from Synthetic Aperture Radar (SAR) data covers a wide range of applications. SAR ATR helps to detect and track vehicles and other objects, e.g. in disaster relief and surveillance operations. Aircraft classification covers a significant part of this research area, which differs from other SAR-based ATR tasks, such as ship and ground vehicle detection and classification, in that aircrafts are usually a static target, often remaining at the same location and in a given orientation for longer time frames. Today, there is a significant mismatch between the abundance of deep learning-based aircraft classification models and the availability of corresponding datasets. This mismatch has led to models with improved classification performance on specific datasets, but the challenge of generalizing to conditions not present in the training data (which are expected to occur in operational conditions) has not yet been satisfactorily analyzed. This paper aims to evaluate how classification performance and generalization capabilities of deep learning models are influenced by the diversity of the training dataset. Our goal is to understand the model’s competence and the conditions under which it can achieve proficiency in aircraft classification tasks for high-resolution SAR images while demonstrating generalization capabilities when confronted with novel data that include different geographic locations, environmental conditions, and geometric variations. We address this gap by using manually annotated high-resolution SAR data from TerraSAR-X and TanDEM-X and show how the classification performance changes for different application scenarios requiring different training and evaluation setups. We find that, as expected, the type of aircraft plays a crucial role in the classification problem, since it will vary in shape and dimension. However, these aspects are secondary to how the SAR image is acquired, with the acquisition geometry playing the primary role. Therefore, we find that the characteristics of the acquisition are much more relevant for generalization than the complex geometry of the target. We show this for various models selected among the standard classification algorithms.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 312-323"},"PeriodicalIF":10.6,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}