{"title":"Cyclic Cross-Modality Interaction for Hyperspectral and Multispectral Image Fusion","authors":"Shi Chen;Lefei Zhang;Liangpei Zhang","doi":"10.1109/TCSVT.2024.3461829","DOIUrl":null,"url":null,"abstract":"Integrating low-resolution hyperspectral images with high-resolution multispectral images is an effective approach to derive high-resolution hyperspectral images. Recently, numerous deep learning-based approaches have been employed to model the mapping relationships for the fusion directly. However, these methods often neglect the spectral characteristics and fail to facilitate comprehensive interactions among global features from heterogeneous modalities. In this paper, we propose a novel cyclic Transformer based on the cross-modality spatial-spectral interaction, exploiting diverse interaction modes to explore the similarity and complementarity among cross-modality features. Specifically, we design a cyclic interactive architecture to fully exploit the abundant spectral prior information in low-resolution hyperspectral images and the rich spatial prior information in high-resolution multispectral images. By incorporating spatial and spectral priors into the attention mechanisms in Transformer modules, we explore the long-range dependency information within the cross-modality features. Furthermore, to enhance interaction among features from different modalities, we devise the cross-modality adaptive interaction mechanisms in both spatial and spectral dimensions to facilitate information reciprocity between different modalities. Extensive experiments demonstrate that the proposed approach outperforms the state-of-the-art fusion methods both quantitatively and visually. The code is available at <uri>https://github.com/Tomchenshi/CYformer</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"741-753"},"PeriodicalIF":11.1000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10681101/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Integrating low-resolution hyperspectral images with high-resolution multispectral images is an effective approach to derive high-resolution hyperspectral images. Recently, numerous deep learning-based approaches have been employed to model the mapping relationships for the fusion directly. However, these methods often neglect the spectral characteristics and fail to facilitate comprehensive interactions among global features from heterogeneous modalities. In this paper, we propose a novel cyclic Transformer based on the cross-modality spatial-spectral interaction, exploiting diverse interaction modes to explore the similarity and complementarity among cross-modality features. Specifically, we design a cyclic interactive architecture to fully exploit the abundant spectral prior information in low-resolution hyperspectral images and the rich spatial prior information in high-resolution multispectral images. By incorporating spatial and spectral priors into the attention mechanisms in Transformer modules, we explore the long-range dependency information within the cross-modality features. Furthermore, to enhance interaction among features from different modalities, we devise the cross-modality adaptive interaction mechanisms in both spatial and spectral dimensions to facilitate information reciprocity between different modalities. Extensive experiments demonstrate that the proposed approach outperforms the state-of-the-art fusion methods both quantitatively and visually. The code is available at https://github.com/Tomchenshi/CYformer.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.