Dual-Path Deep Unsupervised Learning for Multi-Focus Image Fusion

IF 9.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-12-23 DOI:10.1109/TMM.2024.3521817

Yuhui Quan;Xi Wan;Tianxiang Zheng;Yan Huang;Hui Ji

{"title":"Dual-Path Deep Unsupervised Learning for Multi-Focus Image Fusion","authors":"Yuhui Quan;Xi Wan;Tianxiang Zheng;Yan Huang;Hui Ji","doi":"10.1109/TMM.2024.3521817","DOIUrl":null,"url":null,"abstract":"Multi-focus image fusion (MFIF) aims at merging multiple images captured at different focal lengths to create an all-in-focus image. This paper introduces a fully unsupervised learning approach for MFIF that uses only pairs of defocused images for end-to-end training, bypassing the need for ground-truths in supervised learning. Unlike existing methods training via a similarity loss between fused and source images, we propose a dual-path learning framework comprising two networks: an image fuser and a mask predictor. The mask predictor is modeled as a self-supervised denoising network on imperfect fusion masks, trained with a masking-based unsupervised learning scheme. The image fuser, crafted with deep unrolling, leverages the output from the mask predictor to supervise its mask generation at each unrolled step. Moreover, we introduce a fusion consistency loss to ensure the alignment between the image fuser and the mask predictor. In extensive experiments, our proposed approach shows superiority over existing end-to-end unsupervised methods and competitive performance against the supervised ones.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1165-1176"},"PeriodicalIF":9.9000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10812788/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-focus image fusion (MFIF) aims at merging multiple images captured at different focal lengths to create an all-in-focus image. This paper introduces a fully unsupervised learning approach for MFIF that uses only pairs of defocused images for end-to-end training, bypassing the need for ground-truths in supervised learning. Unlike existing methods training via a similarity loss between fused and source images, we propose a dual-path learning framework comprising two networks: an image fuser and a mask predictor. The mask predictor is modeled as a self-supervised denoising network on imperfect fusion masks, trained with a masking-based unsupervised learning scheme. The image fuser, crafted with deep unrolling, leverages the output from the mask predictor to supervise its mask generation at each unrolled step. Moreover, we introduce a fusion consistency loss to ensure the alignment between the image fuser and the mask predictor. In extensive experiments, our proposed approach shows superiority over existing end-to-end unsupervised methods and competitive performance against the supervised ones.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多焦点图像融合的双路径深度无监督学习

多焦点图像融合（MFIF）旨在合并在不同焦距拍摄的多幅图像，以创建全焦点图像。本文介绍了一种用于MFIF的完全无监督学习方法，该方法仅使用散焦图像对进行端到端训练，绕过了监督学习中对基础事实的需要。与现有的通过融合图像和源图像之间的相似度损失进行训练的方法不同，我们提出了一个双路径学习框架，包括两个网络：图像融合器和掩码预测器。掩模预测器被建模为不完美融合掩模上的自监督去噪网络，并使用基于掩模的无监督学习方案进行训练。图像融合器采用深度展开，利用掩码预测器的输出来监督每个展开步骤的掩码生成。此外，我们引入了融合一致性损失，以确保图像融合器和掩模预测器之间的对齐。在大量的实验中，我们提出的方法比现有的端到端无监督方法优越，并且与有监督的方法相比具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.