PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion

IF 19.2 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Ieee-Caa Journal of Automatica Sinica Pub Date : 2024-12-24 DOI:10.1109/JAS.2024.124878

Jinyuan Liu;Xingyuan Li;Zirui Wang;Zhiying Jiang;Wei Zhong;Wei Fan;Bin Xu

{"title":"PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion","authors":"Jinyuan Liu;Xingyuan Li;Zirui Wang;Zhiying Jiang;Wei Zhong;Wei Fan;Bin Xu","doi":"10.1109/JAS.2024.124878","DOIUrl":null,"url":null,"abstract":"The goal of infrared and visible image fusion (IVIF) is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene. However, existing methods struggle to effectively handle modal disparities, resulting in visual degradation of the details and prominent targets of the fused images. To address these challenges, we introduce PromptFusion, a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts. Firstly, to better characterize the features of different modalities, a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities, thereby improving the extraction of fine details and textures. We also introduce a prompt learning mechanism using positive and negative prompts, leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images, leading to improved performance in downstream tasks. Furthermore, we employ bi-level asymptotic convergence optimization. This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent. Our approach advances the state-of-the-art, delivering superior fusion quality and boosting the performance of related downstream tasks. Project page: https://github.com/hey-it-s-me/PromptFusion.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 3","pages":"502-515"},"PeriodicalIF":19.2000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10815008/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The goal of infrared and visible image fusion (IVIF) is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene. However, existing methods struggle to effectively handle modal disparities, resulting in visual degradation of the details and prominent targets of the fused images. To address these challenges, we introduce PromptFusion, a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts. Firstly, to better characterize the features of different modalities, a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities, thereby improving the extraction of fine details and textures. We also introduce a prompt learning mechanism using positive and negative prompts, leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images, leading to improved performance in downstream tasks. Furthermore, we employ bi-level asymptotic convergence optimization. This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent. Our approach advances the state-of-the-art, delivering superior fusion quality and boosting the performance of related downstream tasks. Project page: https://github.com/hey-it-s-me/PromptFusion.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PromptFusion：用于红外和可见光图像融合的协调语义提示学习

红外和可见光图像融合（IVIF）的目标是整合两种模式的独特优势，以实现对场景的更全面的理解。然而，现有的方法难以有效地处理模态差异，导致融合图像的细节和突出目标的视觉退化。为了应对这些挑战，我们引入了PromptFusion，这是一种基于提示的方法，它在语义提示的指导下和谐地组合了多模态图像。首先，为了更好地表征不同模态的特征，设计contourlet自编码器，分离提取不同模态的高/低频分量，从而提高精细细节和纹理的提取。我们还引入了一种使用积极和消极提示的快速学习机制，利用视觉语言模型来提高融合模型对多模态图像中目标的理解和识别，从而提高下游任务的性能。进一步，我们采用了双水平渐近收敛优化。该方法将复杂的非单非凸双水平问题简化为一系列收敛可微的单优化问题，并可通过梯度下降有效地求解。我们的方法推进了最先进的技术，提供了卓越的融合质量，并提高了相关下游任务的性能。项目页面：https://github.com/hey-it-s-me/PromptFusion。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.

期刊最新文献

Front cover Inside back cover Inside front cover Back cover Tensor Low-Rank Orthogonal Compression for Convolutional Neural Networks