KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2025-01-14 DOI:10.1016/j.inffus.2025.102944

Chenjia Yang , Xiaoqing Luo , Zhancheng Zhang , Zhiguo Chen , Xiao-jun Wu

{"title":"KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation","authors":"Chenjia Yang , Xiaoqing Luo , Zhancheng Zhang , Zhiguo Chen , Xiao-jun Wu","doi":"10.1016/j.inffus.2025.102944","DOIUrl":null,"url":null,"abstract":"<div><div>To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at <span><span>https://github.com/lxq-jnu/KDFuse</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102944"},"PeriodicalIF":15.5000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156625352500017X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

To enhance the comprehensiveness of fusion features and meet the requirements of high-level vision tasks, some fusion methods attempt to coordinate the fusion process by directly interacting with the high-level semantic feature. However, due to the significant disparity between high-level semantic domain and fusion representation domain, there is potential for enhancing the effectiveness of the collaborative approach to direct interaction. To overcome this obstacle, a high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation is proposed, referred to as KDFuse. The KDFuse brings multi-task perceptual representation into the same domain through cross-domain knowledge distillation. By facilitating interaction between semantic information and fusion information at an equivalent level, it effectively reduces the gap between the semantic and fusion domains, enabling multi-task collaborative fusion. Specifically, to acquire superior high-level semantic representations essential for instructing the fusion network, the teaching relationship is established to realize multi-task collaboration by the multi-domain interaction distillation module (MIDM). The multi-scale semantic perception module (MSPM) is designed to learn the ability to capture semantic information through the cross-domain knowledge distillation and the semantic detail integration module (SDIM) is constructed to integrate the fusion-level semantic representations with the fusion-level visual representations. Moreover, to balance the semantic and visual representations during the fusion process, the Fourier transform is introduced into the loss function. Extensive comprehensive experiments demonstrate the effectiveness of the proposed method in both image fusion and downstream tasks. The source code is available at https://github.com/lxq-jnu/KDFuse.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

KDFuse：一种基于跨领域知识蒸馏的高级视觉任务驱动的红外和可见光图像融合方法

为了增强融合特征的全面性，满足高级视觉任务的要求，一些融合方法试图通过与高级语义特征直接交互来协调融合过程。然而，由于高级语义域和融合表示域之间存在显著差异，因此协作方法在直接交互方面的有效性仍有提高的潜力。为了克服这一障碍，提出了一种基于跨领域知识蒸馏的高级视觉任务驱动的红外与可见光图像融合方法，称为KDFuse。KDFuse通过跨领域的知识升华，将多任务感知表示引入同一领域。通过促进语义信息与融合信息在等效层次上的交互，有效地缩小了语义域与融合域之间的差距，实现了多任务协同融合。具体而言，为了获得指导融合网络所需的高级语义表示，通过多域交互蒸馏模块（MIDM）建立教学关系，实现多任务协作。设计了多尺度语义感知模块（MSPM），通过跨领域知识蒸馏学习捕获语义信息的能力；构建了语义细节集成模块（SDIM），将融合级语义表示与融合级视觉表示相结合。此外，为了在融合过程中平衡语义表示和视觉表示，在损失函数中引入傅里叶变换。广泛的综合实验证明了该方法在图像融合和下游任务中的有效性。源代码可从https://github.com/lxq-jnu/KDFuse获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.