Visual fidelity and full-scale interaction driven network for infrared and visible image fusion

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-09-01 Epub Date: 2025-03-26 DOI:10.1016/j.patcog.2025.111612

Liye Mei , Xinglong Hu , Zhaoyi Ye , Zhiwei Ye , Chuan Xu , Sheng Liu , Cheng Lei

{"title":"Visual fidelity and full-scale interaction driven network for infrared and visible image fusion","authors":"Liye Mei , Xinglong Hu , Zhaoyi Ye , Zhiwei Ye , Chuan Xu , Sheng Liu , Cheng Lei","doi":"10.1016/j.patcog.2025.111612","DOIUrl":null,"url":null,"abstract":"<div><div>The objective of infrared and visible image fusion is to combine the unique strengths of source images into a single image that serves human visual perception and machine detection. The existing fusion networks are still lacking in the effective characterization and retention of source image features. To counter these deficiencies, we propose a visual fidelity and full-scale interaction driven network for infrared and visible image fusion, named VFFusion. First, a multi-scale feature encoder based on BiFormer is constructed, and a feature cascade interaction module is designed to perform full-scale interaction on features distributed across different scales. In addition, a visual fidelity branch is built to process multi-scale features in parallel with the fusion branch. Specifically, the visual fidelity branch uses blurred images for self-supervised training in the constructed auxiliary task, thereby obtaining an effective representation of the source image information. By exploring the complementary representational features of infrared and visible images as supervisory information, it constrains the fusion branch to retain the source image features in the fused image. Notably, the visual fidelity branch employs a multi-scale joint reconstruction loss, utilizing the rich supervisory signals provided by multi-scale original images to enhance the feature representation of targets at different scales, resulting in clear fusion of the targets. Extensive qualitative and quantitative comparative experiments are conducted on four datasets against nine advanced methods, demonstrating the superiority of our approach. The source code is available at <span><span>https://github.com/XingLongH/VFFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111612"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002729","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The objective of infrared and visible image fusion is to combine the unique strengths of source images into a single image that serves human visual perception and machine detection. The existing fusion networks are still lacking in the effective characterization and retention of source image features. To counter these deficiencies, we propose a visual fidelity and full-scale interaction driven network for infrared and visible image fusion, named VFFusion. First, a multi-scale feature encoder based on BiFormer is constructed, and a feature cascade interaction module is designed to perform full-scale interaction on features distributed across different scales. In addition, a visual fidelity branch is built to process multi-scale features in parallel with the fusion branch. Specifically, the visual fidelity branch uses blurred images for self-supervised training in the constructed auxiliary task, thereby obtaining an effective representation of the source image information. By exploring the complementary representational features of infrared and visible images as supervisory information, it constrains the fusion branch to retain the source image features in the fused image. Notably, the visual fidelity branch employs a multi-scale joint reconstruction loss, utilizing the rich supervisory signals provided by multi-scale original images to enhance the feature representation of targets at different scales, resulting in clear fusion of the targets. Extensive qualitative and quantitative comparative experiments are conducted on four datasets against nine advanced methods, demonstrating the superiority of our approach. The source code is available at https://github.com/XingLongH/VFFusion.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视觉保真度和全尺度交互驱动的红外与可见光图像融合网络

红外和可见光图像融合的目的是将源图像的独特优势结合成一个单一的图像，为人类的视觉感知和机器检测服务。现有的融合网络在有效表征和保留源图像特征方面仍然存在不足。为了克服这些不足，我们提出了一种视觉保真度和全尺寸交互驱动的红外和可见光图像融合网络，称为VFFusion。首先，构建了基于BiFormer的多尺度特征编码器，设计了特征级联交互模块，对分布在不同尺度上的特征进行全尺度交互；此外，建立了视觉保真度分支，与融合分支并行处理多尺度特征。具体而言，视觉保真度分支在构建的辅助任务中使用模糊图像进行自监督训练，从而获得源图像信息的有效表示。通过探索红外图像和可见光图像互补的表征特征作为监督信息，约束融合分支在融合图像中保留源图像特征。值得注意的是，视觉保真度分支采用了多尺度联合重建损失，利用多尺度原始图像提供的丰富的监控信号，增强了不同尺度下目标的特征表示，实现了目标的清晰融合。在四个数据集上对九种先进的方法进行了广泛的定性和定量对比实验，证明了我们方法的优越性。源代码可从https://github.com/XingLongH/VFFusion获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.