Pub Date : 2025-08-22DOI: 10.1109/LGRS.2025.3601670
Jinliang An;Longlong Dai;Muzi Wang;Weidong Zhang
Recently, transformer-based approaches have emerged as powerful tools for hyperspectral image (HSI) classification. HSI inherently exhibits low-rank and sparse properties due to spatial continuity and spectral redundancy. However, most existing methods directly adopt standard transformer architectures, overlooking the distinctive priors inherent in HSI, which limits the classification performance and modeling efficiency. To address these challenges, this letter proposes a multiscale low-rank and sparse transformer (MLSFormer) that effectively integrates both low-rank and sparse priors. Specifically, we leverage tensor low-rank decomposition (TLRD) to factorize the query, key, and value matrices into low-rank tensor products, capturing dominant low-rank structures. In parallel, we introduce a sparse attention mechanism to retain only the most important connections. Furthermore, a multiscale attention mechanism is designed to hierarchically partition attention heads into global, medium, and local groups, each assigned tailored decomposition ranks and sparsity ratios, enabling comprehensive multiscale feature extraction. Extensive experiments on three benchmark datasets demonstrate that MLSFormer achieves superior classification performance compared to state-of-the-art methods.
{"title":"Multiscale Low-Rank and Sparse Attention-Based Transformer for Hyperspectral Image Classification","authors":"Jinliang An;Longlong Dai;Muzi Wang;Weidong Zhang","doi":"10.1109/LGRS.2025.3601670","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601670","url":null,"abstract":"Recently, transformer-based approaches have emerged as powerful tools for hyperspectral image (HSI) classification. HSI inherently exhibits low-rank and sparse properties due to spatial continuity and spectral redundancy. However, most existing methods directly adopt standard transformer architectures, overlooking the distinctive priors inherent in HSI, which limits the classification performance and modeling efficiency. To address these challenges, this letter proposes a multiscale low-rank and sparse transformer (MLSFormer) that effectively integrates both low-rank and sparse priors. Specifically, we leverage tensor low-rank decomposition (TLRD) to factorize the query, key, and value matrices into low-rank tensor products, capturing dominant low-rank structures. In parallel, we introduce a sparse attention mechanism to retain only the most important connections. Furthermore, a multiscale attention mechanism is designed to hierarchically partition attention heads into global, medium, and local groups, each assigned tailored decomposition ranks and sparsity ratios, enabling comprehensive multiscale feature extraction. Extensive experiments on three benchmark datasets demonstrate that MLSFormer achieves superior classification performance compared to state-of-the-art methods.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-22DOI: 10.1109/LGRS.2025.3601600
Yuquan Gan;Xingyu Li;Siyu Wu;Mengjiao Wang
Hyperspectral anomaly detection (HAD) aims to identify anomalous targets that differ from the background in high-dimensional spectral images, and is widely applied in fields such as military reconnaissance and environmental monitoring. However, the diversity of anomaly scales, interference from complex backgrounds, and redundancy of spectral information pose significant challenges to achieving high detection accuracy. To address these issues, this letter proposes a multiscale attention-guided context network (MACNet) to enhance the perception of anomalous regions. MACNet consists of three components: a multiscale local feature extractor (MSLFE) that effectively captures edge structures and subtle anomalies at different scales, a global context awareness module (GCAM) that fuses local and global contextual information to improve discrimination under complex backgrounds, and a refined reconstruction and contrast enhancement module (RRCE) that employs channel attention and spatial reconstruction mechanisms to enhance the response differences between anomalies and background. Experiments on four publicly available hyperspectral datasets demonstrate that MACNet achieves superior detection accuracy compared to existing mainstream methods, validating the effectiveness of the proposed approach.
{"title":"MACNet: A Multiscale Attention-Guided Contextual Network for Hyperspectral Anomaly Detection","authors":"Yuquan Gan;Xingyu Li;Siyu Wu;Mengjiao Wang","doi":"10.1109/LGRS.2025.3601600","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601600","url":null,"abstract":"Hyperspectral anomaly detection (HAD) aims to identify anomalous targets that differ from the background in high-dimensional spectral images, and is widely applied in fields such as military reconnaissance and environmental monitoring. However, the diversity of anomaly scales, interference from complex backgrounds, and redundancy of spectral information pose significant challenges to achieving high detection accuracy. To address these issues, this letter proposes a multiscale attention-guided context network (MACNet) to enhance the perception of anomalous regions. MACNet consists of three components: a multiscale local feature extractor (MSLFE) that effectively captures edge structures and subtle anomalies at different scales, a global context awareness module (GCAM) that fuses local and global contextual information to improve discrimination under complex backgrounds, and a refined reconstruction and contrast enhancement module (RRCE) that employs channel attention and spatial reconstruction mechanisms to enhance the response differences between anomalies and background. Experiments on four publicly available hyperspectral datasets demonstrate that MACNet achieves superior detection accuracy compared to existing mainstream methods, validating the effectiveness of the proposed approach.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most lightweight super-resolution networks are designed to improve performance by introducing an attention mechanism and to reduce model parameters by designing lightweight convolutional layers. However, the introduction of the attention mechanism often leads to an increase in the number of parameters. In addition, the lightweight convolutional layer has a limited receptive field and cannot effectively capture long-range dependencies. In this letter, we design a novel lightweight base module called partial attention convolution (PAConv) and develop three variants of PAConv with different receptive fields to collaboratively exploit nonlocal information. Based on PAConv, we further propose a lightweight super-resolution network called partial attention feature aggregation network (PAFAN). Specifically, we arrange the PAConv variants in a progressive iterative manner to form the attention progressive feature distillation block (APFDB), which aims to gradually optimize the distilled features. Furthermore, we construct a multilevel aggregation spatial attention (MASA) via a stacking of the PAConv variants to systematically coordinate multiscale structural information. Extensive experiments conducted on benchmark datasets show that PAFAN achieves an optimal balance between reconstruction quality and computational efficiency. In particular, with only 123 K parameters and 0.49G FLOPs, PAFAN can maintain a performance comparable to that of SOTA methods.
{"title":"Partial Attention Feature Aggregation Network for Lightweight Remote Sensing Image Super-Resolution","authors":"Wei Xue;Tiancheng Shao;Mingyang Du;Xiao Zheng;Ping Zhong","doi":"10.1109/LGRS.2025.3601595","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601595","url":null,"abstract":"Most lightweight super-resolution networks are designed to improve performance by introducing an attention mechanism and to reduce model parameters by designing lightweight convolutional layers. However, the introduction of the attention mechanism often leads to an increase in the number of parameters. In addition, the lightweight convolutional layer has a limited receptive field and cannot effectively capture long-range dependencies. In this letter, we design a novel lightweight base module called partial attention convolution (PAConv) and develop three variants of PAConv with different receptive fields to collaboratively exploit nonlocal information. Based on PAConv, we further propose a lightweight super-resolution network called partial attention feature aggregation network (PAFAN). Specifically, we arrange the PAConv variants in a progressive iterative manner to form the attention progressive feature distillation block (APFDB), which aims to gradually optimize the distilled features. Furthermore, we construct a multilevel aggregation spatial attention (MASA) via a stacking of the PAConv variants to systematically coordinate multiscale structural information. Extensive experiments conducted on benchmark datasets show that PAFAN achieves an optimal balance between reconstruction quality and computational efficiency. In particular, with only 123 K parameters and 0.49G FLOPs, PAFAN can maintain a performance comparable to that of SOTA methods.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-22DOI: 10.1109/LGRS.2025.3601585
Zhilin Qu;Mingzhe Li;Chenggong Wang;Zehua Chen
Road extraction from remote sensing imagery plays a pivotal role in a wide range of geospatial and urban applications. Nevertheless, this task remains inherently challenging due to the intricate morphological variations of roads and frequent occlusions or interference caused by complex background environments. To address these challenges, we propose a road extraction network based on gated global–local linear attention (G$^2$ L$^2$ Attention). First, we introduce a linear deformable convolution and design a linear input-dependent deformable convolution (LID2Conv), which adaptively modulates convolution offsets and weights in a content-aware manner. In addition, we design a top-K-based sparse gated weight (TGW). We use this gated mechanism as a shared weight to multiply with local and global information to achieve G2L2Attention. Local information is obtained by LID2Conv, and we gain global information by introducing 2-D selective scan (SS2D). These two pathways are integrated through the proposed G2L2Attention, enabling an efficient and consistent fusion of hierarchical spatial features. The extracted features are passed to the decoder. This approach improves road detail representation and provides accurate contextual information. Experiments conducted on three public road datasets demonstrate that G2L2Net outperforms the existing methods in various evaluation metrics. Our source code is available at https://github.com/ZehuaChenLab
从遥感影像中提取道路在广泛的地理空间和城市应用中起着关键作用。然而,由于道路复杂的形态变化和复杂背景环境引起的频繁遮挡或干扰,这项任务仍然具有固有的挑战性。为了解决这些挑战,我们提出了一个基于门控全局-局部线性关注(G $^2$ L $^2$ attention)的道路提取网络。首先,我们引入了一个线性可变形卷积,并设计了一个线性输入相关的可变形卷积(LID2Conv),该卷积以内容感知的方式自适应调节卷积偏移量和权重。此外,我们设计了一个基于顶部的稀疏门控权(TGW)。我们使用这种门控机制作为共享权值与局部和全局信息相乘来实现G2L2Attention。局部信息由LID2Conv获取,全局信息由二维选择性扫描(SS2D)获取。通过提出的G2L2Attention将这两条路径整合在一起,实现了分层空间特征的高效一致融合。提取的特征被传递给解码器。这种方法改进了道路细节表示,并提供了准确的上下文信息。在三个公共道路数据集上进行的实验表明,G2L2Net在各种评估指标上优于现有方法。我们的源代码可从https://github.com/ZehuaChenLab获得
{"title":"G2L2Net: A Road Extraction Method for Remote Sensing Images via Gated Global–Local Linear Attention","authors":"Zhilin Qu;Mingzhe Li;Chenggong Wang;Zehua Chen","doi":"10.1109/LGRS.2025.3601585","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601585","url":null,"abstract":"Road extraction from remote sensing imagery plays a pivotal role in a wide range of geospatial and urban applications. Nevertheless, this task remains inherently challenging due to the intricate morphological variations of roads and frequent occlusions or interference caused by complex background environments. To address these challenges, we propose a road extraction network based on gated global–local linear attention (G<inline-formula> <tex-math>$^2$ </tex-math></inline-formula>L<inline-formula> <tex-math>$^2$ </tex-math></inline-formula>Attention). First, we introduce a linear deformable convolution and design a linear input-dependent deformable convolution (LID2Conv), which adaptively modulates convolution offsets and weights in a content-aware manner. In addition, we design a top-K-based sparse gated weight (TGW). We use this gated mechanism as a shared weight to multiply with local and global information to achieve G2L2Attention. Local information is obtained by LID2Conv, and we gain global information by introducing 2-D selective scan (SS2D). These two pathways are integrated through the proposed G2L2Attention, enabling an efficient and consistent fusion of hierarchical spatial features. The extracted features are passed to the decoder. This approach improves road detail representation and provides accurate contextual information. Experiments conducted on three public road datasets demonstrate that G2L2Net outperforms the existing methods in various evaluation metrics. Our source code is available at <uri>https://github.com/ZehuaChenLab</uri>","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1109/LGRS.2025.3601112
Jin Xing;Feng Wang;Dongkai Yang;Chuanrui Tan;Xiangchao Ma;Wenqian Chen;Guangmiao Ji
Global navigation satellite system-reflectometry (GNSS-R) provides an effective remote sensing technique for accurate retrieval of sea surface height (SSH) measurements. However, accuracy is severely affected by environmental disturbances such as wind-induced sea clutter and wave interference, degrading delay–Doppler map (DDM)-derived measurements. In this study, we propose an advanced trajectory-based deep learning model, Crossformer, explicitly designed to capture temporal dependencies inherent in GNSS-R sequential data. The method leverages five distinct DDM features: peak power point (PPP), maximum slope point (MSP), center pixel intensity (CPI), average power point (APP), and kurtosis (KUR). A dimension-segmentwise (DSW) embedding technique combined with a two-stage attention (TSA) mechanism effectively models both temporal and cross-dimensional correlations. Evaluation using CYGNSS data validated against Jason-3 Level 2 measurements demonstrates the superior performance of our approach, yielding a root mean square error (RMSE) of 0.93 m, mean absolute error (MAE) of 0.65 m, and a coefficient of determination ($R^{2}$ ) of 0.9901. Comparative analyses with baseline methods confirm significant improvements in robustness and predictive accuracy, particularly across varying sea states. This research underscores the potential of advanced temporal modeling techniques in GNSS-R altimetry applications.
{"title":"A Crossformer-Based Method for Sea Surface Height Prediction Using Delay–Doppler Map Feature Points","authors":"Jin Xing;Feng Wang;Dongkai Yang;Chuanrui Tan;Xiangchao Ma;Wenqian Chen;Guangmiao Ji","doi":"10.1109/LGRS.2025.3601112","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601112","url":null,"abstract":"Global navigation satellite system-reflectometry (GNSS-R) provides an effective remote sensing technique for accurate retrieval of sea surface height (SSH) measurements. However, accuracy is severely affected by environmental disturbances such as wind-induced sea clutter and wave interference, degrading delay–Doppler map (DDM)-derived measurements. In this study, we propose an advanced trajectory-based deep learning model, Crossformer, explicitly designed to capture temporal dependencies inherent in GNSS-R sequential data. The method leverages five distinct DDM features: peak power point (PPP), maximum slope point (MSP), center pixel intensity (CPI), average power point (APP), and kurtosis (KUR). A dimension-segmentwise (DSW) embedding technique combined with a two-stage attention (TSA) mechanism effectively models both temporal and cross-dimensional correlations. Evaluation using CYGNSS data validated against Jason-3 Level 2 measurements demonstrates the superior performance of our approach, yielding a root mean square error (RMSE) of 0.93 m, mean absolute error (MAE) of 0.65 m, and a coefficient of determination (<inline-formula> <tex-math>$R^{2}$ </tex-math></inline-formula>) of 0.9901. Comparative analyses with baseline methods confirm significant improvements in robustness and predictive accuracy, particularly across varying sea states. This research underscores the potential of advanced temporal modeling techniques in GNSS-R altimetry applications.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1109/LGRS.2025.3601131
Jingang Wang;Shikai Wu;Peng Liu
Maritime monitoring is crucial in both civilian and military applications, with shore-based radar and visual systems widely used due to their cost effectiveness. However, single-sensor methods have notable limitations: radar systems, while offering wide detection coverage, suffer from high false alarm rates and lack detailed target information, whereas visual systems provide rich details but perform poorly in adverse weather conditions such as rain and fog. To address these issues, this letter proposes a progressive radar–vision fusion method for surface target detection. Due to the significant differences in data characteristics between radar and visual sensors, direct fusion is nearly infeasible. Instead, the proposed method adopts a stepwise fusion strategy, consisting of coordinate calibration, shallow feature fusion, and deep feature integration. Experimental results show that this approach achieves an $text {mAP}_{50}$ of 86.7% and an $text {mAP}_{75}$ of 54.5%, outperforming YOLOv10 by 1.0% and 1.5%, respectively. Moreover, the proposed method significantly surpasses existing state-of-the-art radar–vision fusion approaches, demonstrating its superior effectiveness in complex environments.
{"title":"ProFus: Progressive Radar–Vision Heterogeneous Modality Fusion for Maritime Target Detection","authors":"Jingang Wang;Shikai Wu;Peng Liu","doi":"10.1109/LGRS.2025.3601131","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601131","url":null,"abstract":"Maritime monitoring is crucial in both civilian and military applications, with shore-based radar and visual systems widely used due to their cost effectiveness. However, single-sensor methods have notable limitations: radar systems, while offering wide detection coverage, suffer from high false alarm rates and lack detailed target information, whereas visual systems provide rich details but perform poorly in adverse weather conditions such as rain and fog. To address these issues, this letter proposes a progressive radar–vision fusion method for surface target detection. Due to the significant differences in data characteristics between radar and visual sensors, direct fusion is nearly infeasible. Instead, the proposed method adopts a stepwise fusion strategy, consisting of coordinate calibration, shallow feature fusion, and deep feature integration. Experimental results show that this approach achieves an <inline-formula> <tex-math>$text {mAP}_{50}$ </tex-math></inline-formula> of 86.7% and an <inline-formula> <tex-math>$text {mAP}_{75}$ </tex-math></inline-formula> of 54.5%, outperforming YOLOv10 by 1.0% and 1.5%, respectively. Moreover, the proposed method significantly surpasses existing state-of-the-art radar–vision fusion approaches, demonstrating its superior effectiveness in complex environments.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1109/LGRS.2025.3601152
Xiaorun Li;Jinhui Li;Shuhan Chen;Zeyu Cao
In recent years, self-supervised learning has made significant strides in hyperspectral image classification (HSIC). However, different approaches come with distinct strengths and limitations. Contrastive learning excels at extracting key information from large volumes of redundant data, but its training objective can inadvertently increase intraclass feature distance. To address this limitation, we leverage diffusion models (DMs) for their proven ability to refine and aggregate features by modeling complex data distributions. Specifically, DMs’ inherent denoising and generative processes are theoretically well-suited to enhance intraclass compactness by learning to reconstruct clean, representative features from perturbed inputs. We propose the new method—ContrastDM. This approach generates synthetic features, improving and enriching feature representation, and partially addressing the issue of sample sparsity. Classification experiments on three publicly available datasets demonstrate that ContrastDM significantly outperforms state-of-the-art methods.
{"title":"Combining Contrastive Learning and Diffusion Model for Hyperspectral Image Classification","authors":"Xiaorun Li;Jinhui Li;Shuhan Chen;Zeyu Cao","doi":"10.1109/LGRS.2025.3601152","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601152","url":null,"abstract":"In recent years, self-supervised learning has made significant strides in hyperspectral image classification (HSIC). However, different approaches come with distinct strengths and limitations. Contrastive learning excels at extracting key information from large volumes of redundant data, but its training objective can inadvertently increase intraclass feature distance. To address this limitation, we leverage diffusion models (DMs) for their proven ability to refine and aggregate features by modeling complex data distributions. Specifically, DMs’ inherent denoising and generative processes are theoretically well-suited to enhance intraclass compactness by learning to reconstruct clean, representative features from perturbed inputs. We propose the new method—ContrastDM. This approach generates synthetic features, improving and enriching feature representation, and partially addressing the issue of sample sparsity. Classification experiments on three publicly available datasets demonstrate that ContrastDM significantly outperforms state-of-the-art methods.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1109/LGRS.2025.3601230
Yubo Ma;Wei He;Siyu Cai;Qingke Zou
Single hyperspectral image (HSI) super-resolution (SR), which is limited by the lack of exterior information, has always been a challenging task. A lot of effort has gone into fully mining spectral information or adopting pretrained models to enhance spatial resolution. However, few SR approaches take into account structural features from the perspective of multidimensional segmentation of the image. Therefore, a novel spectral–spatial segmentation-based local bicubic interpolation (S3LBI) is proposed to implement segmented and blocked interpolation according to the characteristics of HSI. Specifically, the bands of an HSI are clustered into several spectral segments. Then, super-pixel segmentation is carried out in each spectral segment. After that, the bicubic interpolations are separately conducted on different spectral–spatial segments. Experiments demonstrate the superiority of our S3LBI over the compared HSI SR approaches.
{"title":"S3LBI: Spectral–Spatial Segmentation-Based Local Bicubic Interpolation for Single Hyperspectral Image Super-Resolution","authors":"Yubo Ma;Wei He;Siyu Cai;Qingke Zou","doi":"10.1109/LGRS.2025.3601230","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601230","url":null,"abstract":"Single hyperspectral image (HSI) super-resolution (SR), which is limited by the lack of exterior information, has always been a challenging task. A lot of effort has gone into fully mining spectral information or adopting pretrained models to enhance spatial resolution. However, few SR approaches take into account structural features from the perspective of multidimensional segmentation of the image. Therefore, a novel spectral–spatial segmentation-based local bicubic interpolation (S3LBI) is proposed to implement segmented and blocked interpolation according to the characteristics of HSI. Specifically, the bands of an HSI are clustered into several spectral segments. Then, super-pixel segmentation is carried out in each spectral segment. After that, the bicubic interpolations are separately conducted on different spectral–spatial segments. Experiments demonstrate the superiority of our S3LBI over the compared HSI SR approaches.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1109/LGRS.2025.3601083
Bei Cheng;Zao Liu;Huxiao Tang;Qingwang Wang;Wenhao Chen;Tao Chen;Tao Shen
The latest remote sensing image saliency detectors primarily rely on RGB information alone. However, spatial and geometric information embedded in depth images is robust to variations in lighting and color. Integrating depth information with RGB images can enhance the spatial structure of objects. In light of this, we innovatively propose a remote sensing image saliency detection model that fuses RGB and depth information, named the multimodal-guided transformer architecture (MGTA). Specifically, we first introduce the strongly correlated complementary fusion (SCCF) module to explore cross-modal consistency and similarity, maintaining consistency across different modalities while uncovering multidimensional common information. In addition, the global–local context information interaction (GLCII) module is designed to extract global semantic information and local detail information, effectively utilizing contextual information while reducing the number of parameters. Finally, a cascaded feature-guided decoder (CFGD) is employed to gradually fuse hierarchical decoding features, effectively integrating multilevel data and accurately locating target positions. Extensive experiments demonstrate that our proposed model outperforms 14 state-of-the-art methods. The code and results of our method are available at https://github.com/Zackisliuzao/MGTANet
{"title":"Multimodal-Guided Transformer Architecture for Remote Sensing Salient Object Detection","authors":"Bei Cheng;Zao Liu;Huxiao Tang;Qingwang Wang;Wenhao Chen;Tao Chen;Tao Shen","doi":"10.1109/LGRS.2025.3601083","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601083","url":null,"abstract":"The latest remote sensing image saliency detectors primarily rely on RGB information alone. However, spatial and geometric information embedded in depth images is robust to variations in lighting and color. Integrating depth information with RGB images can enhance the spatial structure of objects. In light of this, we innovatively propose a remote sensing image saliency detection model that fuses RGB and depth information, named the multimodal-guided transformer architecture (MGTA). Specifically, we first introduce the strongly correlated complementary fusion (SCCF) module to explore cross-modal consistency and similarity, maintaining consistency across different modalities while uncovering multidimensional common information. In addition, the global–local context information interaction (GLCII) module is designed to extract global semantic information and local detail information, effectively utilizing contextual information while reducing the number of parameters. Finally, a cascaded feature-guided decoder (CFGD) is employed to gradually fuse hierarchical decoding features, effectively integrating multilevel data and accurately locating target positions. Extensive experiments demonstrate that our proposed model outperforms 14 state-of-the-art methods. The code and results of our method are available at <uri>https://github.com/Zackisliuzao/MGTANet</uri>","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1109/LGRS.2025.3601200
Binpeng Yan;Jiaqi Zhao;Mutian Li;Rui Pan
The channel system is intimately linked to the formation of oil and gas reservoirs. In petroliferous basins, channel deposits frequently serve as both storage spaces and fluid conduits. Consequently, the accurate identification of channels in 3-D seismic data is, therefore, critical for reservoir prediction. Traditional seismic attribute-based methods can outline channel boundaries, but noise and stratigraphic complexity introduce discontinuities that reduce accuracy and require extensive manual correction. Deep learning-based methods outperform conventional methods in terms of efficiency and precision. However, the similar seismic signatures of channels and continuous karst caves in seismic profiles can still mislead the existing models. To address this challenge, we proposed an improved variant of the 3-D TransUnet model for 3-D seismic data recognition. The model incorporates channel and spatial attention mechanisms into the skip connections of the TransUnet architecture, effectively enhancing its feature representation capability and recognition accuracy. In addition, a multiloss function is introduced to improve the delineation and continuity of the channel while increasing the model’s robustness against nonchannel interference features. Experiments on synthetic and field seismic data confirm superior boundary delineation, continuity, and noise resistance compared with baseline methods.
{"title":"Channel Characterization Based on 3-D TransUnet-CBAM With Multiloss Function","authors":"Binpeng Yan;Jiaqi Zhao;Mutian Li;Rui Pan","doi":"10.1109/LGRS.2025.3601200","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3601200","url":null,"abstract":"The channel system is intimately linked to the formation of oil and gas reservoirs. In petroliferous basins, channel deposits frequently serve as both storage spaces and fluid conduits. Consequently, the accurate identification of channels in 3-D seismic data is, therefore, critical for reservoir prediction. Traditional seismic attribute-based methods can outline channel boundaries, but noise and stratigraphic complexity introduce discontinuities that reduce accuracy and require extensive manual correction. Deep learning-based methods outperform conventional methods in terms of efficiency and precision. However, the similar seismic signatures of channels and continuous karst caves in seismic profiles can still mislead the existing models. To address this challenge, we proposed an improved variant of the 3-D TransUnet model for 3-D seismic data recognition. The model incorporates channel and spatial attention mechanisms into the skip connections of the TransUnet architecture, effectively enhancing its feature representation capability and recognition accuracy. In addition, a multiloss function is introduced to improve the delineation and continuity of the channel while increasing the model’s robustness against nonchannel interference features. Experiments on synthetic and field seismic data confirm superior boundary delineation, continuity, and noise resistance compared with baseline methods.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}