Han Lang , Wenxing Bao , Wei Feng , Kewen Qu , Xuan Ma , Xiaowu Zhang
{"title":"Hyperspectral and multispectral images fusion based on pyramid swin transformer","authors":"Han Lang , Wenxing Bao , Wei Feng , Kewen Qu , Xuan Ma , Xiaowu Zhang","doi":"10.1016/j.infrared.2024.105617","DOIUrl":null,"url":null,"abstract":"<div><div>Remote sensing image fusion aims to generate a high spatial resolution hyperspectral image (HR-HSI) by integrating a low spatial resolution hyperspectral image (LR-HSI) and a high spatial resolution multispectral image (HR-MSI). While Convolutional Neural Networks (CNNs) have been widely employed in addressing the HSI-MSI fusion problem, their limited receptive field poses challenges in capturing global relationships within the feature maps. On the other hand, the computational complexity of Transformers hinders their application, especially in dealing with high-dimensional data like hyperspectral images (HSIs). To overcome this challenge, we propose an HSI-MSI fusion method based on the Pyramid Swin Transformer (PSTF). The pyramid design of the PSTF effectively extracts multi-scale information from images. The Spatial–Spectral Crossed Attention (SSCA) module, comprising the Cross Spatial Attention (CSA) and the Spectral Feature Integration (SFI) modules. The CSA module employs a cross-shaped self-attention mechanism, providing greater modeling flexibility for different spatial scales and non-local structures compared to traditional convolutional layers. Meanwhile, the SFI module introduces a global memory block (MB) to select the most relevant low-rank spectral vectors, integrating global spectral information with local spatial–spectral correlation to better extract and preserve spectral information. Additionally, the Separate Feature Extraction (SFE) module enhances the network’s ability to represent image features by independently processing positive and negative parts of shallow features, thus capturing details and structures more effectively and preventing the vanishing gradient problem. Compared with the state-of-the-art (SOTA) methods, experimental results demonstrate the effectiveness of the PSTF method.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"143 ","pages":"Article 105617"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449524005012","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0
Abstract
Remote sensing image fusion aims to generate a high spatial resolution hyperspectral image (HR-HSI) by integrating a low spatial resolution hyperspectral image (LR-HSI) and a high spatial resolution multispectral image (HR-MSI). While Convolutional Neural Networks (CNNs) have been widely employed in addressing the HSI-MSI fusion problem, their limited receptive field poses challenges in capturing global relationships within the feature maps. On the other hand, the computational complexity of Transformers hinders their application, especially in dealing with high-dimensional data like hyperspectral images (HSIs). To overcome this challenge, we propose an HSI-MSI fusion method based on the Pyramid Swin Transformer (PSTF). The pyramid design of the PSTF effectively extracts multi-scale information from images. The Spatial–Spectral Crossed Attention (SSCA) module, comprising the Cross Spatial Attention (CSA) and the Spectral Feature Integration (SFI) modules. The CSA module employs a cross-shaped self-attention mechanism, providing greater modeling flexibility for different spatial scales and non-local structures compared to traditional convolutional layers. Meanwhile, the SFI module introduces a global memory block (MB) to select the most relevant low-rank spectral vectors, integrating global spectral information with local spatial–spectral correlation to better extract and preserve spectral information. Additionally, the Separate Feature Extraction (SFE) module enhances the network’s ability to represent image features by independently processing positive and negative parts of shallow features, thus capturing details and structures more effectively and preventing the vanishing gradient problem. Compared with the state-of-the-art (SOTA) methods, experimental results demonstrate the effectiveness of the PSTF method.
期刊介绍:
The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region.
Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine.
Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.