Visual fidelity and full-scale interaction driven network for infrared and visible image fusion

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-09-01 Epub Date: 2025-03-26 DOI:10.1016/j.patcog.2025.111612
Liye Mei , Xinglong Hu , Zhaoyi Ye , Zhiwei Ye , Chuan Xu , Sheng Liu , Cheng Lei
{"title":"Visual fidelity and full-scale interaction driven network for infrared and visible image fusion","authors":"Liye Mei ,&nbsp;Xinglong Hu ,&nbsp;Zhaoyi Ye ,&nbsp;Zhiwei Ye ,&nbsp;Chuan Xu ,&nbsp;Sheng Liu ,&nbsp;Cheng Lei","doi":"10.1016/j.patcog.2025.111612","DOIUrl":null,"url":null,"abstract":"<div><div>The objective of infrared and visible image fusion is to combine the unique strengths of source images into a single image that serves human visual perception and machine detection. The existing fusion networks are still lacking in the effective characterization and retention of source image features. To counter these deficiencies, we propose a visual fidelity and full-scale interaction driven network for infrared and visible image fusion, named VFFusion. First, a multi-scale feature encoder based on BiFormer is constructed, and a feature cascade interaction module is designed to perform full-scale interaction on features distributed across different scales. In addition, a visual fidelity branch is built to process multi-scale features in parallel with the fusion branch. Specifically, the visual fidelity branch uses blurred images for self-supervised training in the constructed auxiliary task, thereby obtaining an effective representation of the source image information. By exploring the complementary representational features of infrared and visible images as supervisory information, it constrains the fusion branch to retain the source image features in the fused image. Notably, the visual fidelity branch employs a multi-scale joint reconstruction loss, utilizing the rich supervisory signals provided by multi-scale original images to enhance the feature representation of targets at different scales, resulting in clear fusion of the targets. Extensive qualitative and quantitative comparative experiments are conducted on four datasets against nine advanced methods, demonstrating the superiority of our approach. The source code is available at <span><span>https://github.com/XingLongH/VFFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111612"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002729","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The objective of infrared and visible image fusion is to combine the unique strengths of source images into a single image that serves human visual perception and machine detection. The existing fusion networks are still lacking in the effective characterization and retention of source image features. To counter these deficiencies, we propose a visual fidelity and full-scale interaction driven network for infrared and visible image fusion, named VFFusion. First, a multi-scale feature encoder based on BiFormer is constructed, and a feature cascade interaction module is designed to perform full-scale interaction on features distributed across different scales. In addition, a visual fidelity branch is built to process multi-scale features in parallel with the fusion branch. Specifically, the visual fidelity branch uses blurred images for self-supervised training in the constructed auxiliary task, thereby obtaining an effective representation of the source image information. By exploring the complementary representational features of infrared and visible images as supervisory information, it constrains the fusion branch to retain the source image features in the fused image. Notably, the visual fidelity branch employs a multi-scale joint reconstruction loss, utilizing the rich supervisory signals provided by multi-scale original images to enhance the feature representation of targets at different scales, resulting in clear fusion of the targets. Extensive qualitative and quantitative comparative experiments are conducted on four datasets against nine advanced methods, demonstrating the superiority of our approach. The source code is available at https://github.com/XingLongH/VFFusion.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于视觉保真度和全尺度交互驱动的红外与可见光图像融合网络
红外和可见光图像融合的目的是将源图像的独特优势结合成一个单一的图像,为人类的视觉感知和机器检测服务。现有的融合网络在有效表征和保留源图像特征方面仍然存在不足。为了克服这些不足,我们提出了一种视觉保真度和全尺寸交互驱动的红外和可见光图像融合网络,称为VFFusion。首先,构建了基于BiFormer的多尺度特征编码器,设计了特征级联交互模块,对分布在不同尺度上的特征进行全尺度交互;此外,建立了视觉保真度分支,与融合分支并行处理多尺度特征。具体而言,视觉保真度分支在构建的辅助任务中使用模糊图像进行自监督训练,从而获得源图像信息的有效表示。通过探索红外图像和可见光图像互补的表征特征作为监督信息,约束融合分支在融合图像中保留源图像特征。值得注意的是,视觉保真度分支采用了多尺度联合重建损失,利用多尺度原始图像提供的丰富的监控信号,增强了不同尺度下目标的特征表示,实现了目标的清晰融合。在四个数据集上对九种先进的方法进行了广泛的定性和定量对比实验,证明了我们方法的优越性。源代码可从https://github.com/XingLongH/VFFusion获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
期刊最新文献
IrisMAE: Structure-aware masked image modeling for iris recognition Minimizing the pretraining gap: Domain-aligned text-based person retrieval Stealthy backdoor attack method targeting group fairness in self-supervised learning Single-domain generalization for fastener detection via sample reconstruction and class-wise domain contrast EdgeFusionNet: Edge information-guided small object detection for remote sensing images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1