Jorge Bacca , Christian Arcos , Juan Marcos Ramírez , Henry Arguello
{"title":"Middle-output deep image prior for blind hyperspectral and multispectral image fusion","authors":"Jorge Bacca , Christian Arcos , Juan Marcos Ramírez , Henry Arguello","doi":"10.1016/j.image.2024.117247","DOIUrl":null,"url":null,"abstract":"<div><div>Obtaining a low-spatial-resolution hyperspectral image (HS) or low-spectral-resolution multispectral (MS) image from a high-resolution (HR) spectral image is straightforward with knowledge of the acquisition models. However, the reverse process, from HS and MS to HR, is an ill-posed problem known as spectral image fusion. Although recent fusion techniques based on supervised deep learning have shown promising results, these methods require large training datasets involving expensive acquisition costs and long training times. In contrast, unsupervised HS and MS image fusion methods have emerged as an alternative to data demand issues; however, they rely on the knowledge of the linear degradation models for optimal performance. To overcome these challenges, we propose the Middle-Output Deep Image Prior (MODIP) for unsupervised blind HS and MS image fusion. MODIP is adjusted for the HS and MS images, and the HR fused image is estimated at an intermediate layer within the network. The architecture comprises two convolutional networks that reconstruct the HR spectral image from HS and MS inputs, along with two networks that appropriately downscale the estimated HR image to match the available MS and HS images, learning the non-linear degradation models. The network parameters of MODIP are jointly and iteratively adjusted by minimizing a proposed loss function. This approach can handle scenarios where the degradation operators are unknown or partially estimated. To evaluate the performance of MODIP, we test the fusion approach on three simulated spectral image datasets (Pavia University, Salinas Valley, and CAVE) and a real dataset obtained through a testbed implementation in an optical lab. Extensive simulations demonstrate that MODIP outperforms other unsupervised model-based image fusion methods by up to 6 dB in PNSR.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117247"},"PeriodicalIF":3.4000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596524001486","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Obtaining a low-spatial-resolution hyperspectral image (HS) or low-spectral-resolution multispectral (MS) image from a high-resolution (HR) spectral image is straightforward with knowledge of the acquisition models. However, the reverse process, from HS and MS to HR, is an ill-posed problem known as spectral image fusion. Although recent fusion techniques based on supervised deep learning have shown promising results, these methods require large training datasets involving expensive acquisition costs and long training times. In contrast, unsupervised HS and MS image fusion methods have emerged as an alternative to data demand issues; however, they rely on the knowledge of the linear degradation models for optimal performance. To overcome these challenges, we propose the Middle-Output Deep Image Prior (MODIP) for unsupervised blind HS and MS image fusion. MODIP is adjusted for the HS and MS images, and the HR fused image is estimated at an intermediate layer within the network. The architecture comprises two convolutional networks that reconstruct the HR spectral image from HS and MS inputs, along with two networks that appropriately downscale the estimated HR image to match the available MS and HS images, learning the non-linear degradation models. The network parameters of MODIP are jointly and iteratively adjusted by minimizing a proposed loss function. This approach can handle scenarios where the degradation operators are unknown or partially estimated. To evaluate the performance of MODIP, we test the fusion approach on three simulated spectral image datasets (Pavia University, Salinas Valley, and CAVE) and a real dataset obtained through a testbed implementation in an optical lab. Extensive simulations demonstrate that MODIP outperforms other unsupervised model-based image fusion methods by up to 6 dB in PNSR.
期刊介绍:
Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following:
To present a forum for the advancement of theory and practice of image communication.
To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems.
To contribute to a rapid information exchange between the industrial and academic environments.
The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world.
Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments.
Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.