{"title":"Dual Branch Masked Transformer for Hyperspectral Image Classification","authors":"Kuo Li;Yushi Chen;Lingbo Huang","doi":"10.1109/LGRS.2024.3490534","DOIUrl":null,"url":null,"abstract":"Transformer has been widely used in hyperspectral image (HSI) classification tasks because of its ability to capture long-range dependencies. However, most Transformer-based classification methods lack the extraction of local information or do not combine spatial and spectral information well, resulting in insufficient extraction of features. To address these issues, in this study, a dual-branch masked Transformer (Dual-MTr) model is proposed. Masked Transformer (MTr) is used to pretrain vision transformer (ViT) by reconstruction of both masked spatial image and spectral spectrum, which embeds the local bias by the process of recovering from localized patches to the global original input. Different tokenization methods are used for different types of input data. Patch embedding with overlapping regions is used for 2-D spatial data and group embedding is used for 1-D spectral data. Supervised learning has been added to the pretraining process to enhance strong discriminability. Then, the dual-branch structure is proposed to combine the spatial and spectral features. To strengthen the connection between the two branches better, Kullback-Leibler (KL) divergence is used to measure the differences between the classification results of the two branches, and the loss resulting from the computed differences is incorporated into the training process. Experimental results from two hyperspectral datasets demonstrate the effectiveness of the proposed method compared to other methods.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"21 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10745157/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer has been widely used in hyperspectral image (HSI) classification tasks because of its ability to capture long-range dependencies. However, most Transformer-based classification methods lack the extraction of local information or do not combine spatial and spectral information well, resulting in insufficient extraction of features. To address these issues, in this study, a dual-branch masked Transformer (Dual-MTr) model is proposed. Masked Transformer (MTr) is used to pretrain vision transformer (ViT) by reconstruction of both masked spatial image and spectral spectrum, which embeds the local bias by the process of recovering from localized patches to the global original input. Different tokenization methods are used for different types of input data. Patch embedding with overlapping regions is used for 2-D spatial data and group embedding is used for 1-D spectral data. Supervised learning has been added to the pretraining process to enhance strong discriminability. Then, the dual-branch structure is proposed to combine the spatial and spectral features. To strengthen the connection between the two branches better, Kullback-Leibler (KL) divergence is used to measure the differences between the classification results of the two branches, and the loss resulting from the computed differences is incorporated into the training process. Experimental results from two hyperspectral datasets demonstrate the effectiveness of the proposed method compared to other methods.