Shidong Xie , Haiyan Li , Yongsheng Zang , Jinde Cao , Dongming Zhou , Mingchuan Tan , Zhaisheng Ding , Guanbo Wang
{"title":"SCDFuse: A semantic complementary distillation framework for joint infrared and visible image fusion and denoising","authors":"Shidong Xie , Haiyan Li , Yongsheng Zang , Jinde Cao , Dongming Zhou , Mingchuan Tan , Zhaisheng Ding , Guanbo Wang","doi":"10.1016/j.knosys.2025.113262","DOIUrl":null,"url":null,"abstract":"<div><div>Infrared and visible fusion has gained unprecedented attention due to its extensive applications in the field of computer vision. However, existing algorithms unilaterally focus on the fusion of clean scene images and are vulnerable to noise interference. Although this issue can be mitigated by deploying independent pre-denoising modules, the cascading of additional modules with diverse functionalities introduces supplementary complexity, computational overhead, and even inter-module interference. To overcome this limitation and achieve multitask unification, we propose a knowledge distillation framework for end-to-end simultaneous feature denoising and aggregation. In this framework, we leverage the advantages of the distillation architecture to generate soft labels, mitigating unstable fusion performance caused by lacking of label guidance. To achieve an accurate guidance for the function learning during the training process, an asymmetric noise-aware training strategy is devised for the facilitate of aggregation robustness and denoising capability. Moreover, to ensure the feature excavation and semantic complementary competence, a hybrid series–parallel CNN-transformer dual-branch En-Decoder is constructed. The proposed encoder incorporate the self-designed Textural-aware ConvNextV2, strip pooling attention and progressive residual transformer to compose the dual-branch architecture. In addition, the semantic complementary feature aggregation (SCFA) module are developed to realize a coarse-to-fine feature enhancement. Extensive experiments on both regular and noisy fusion materials are implemented to testify the integration and denoising performance of the proposed method. Notably, on the TNO dataset, the proposed method achieves improvements of 4% and 4.2% in the MSSIM and UQI metrics, respectively, compared to the second-best algorithm. Furthermore, we also investigate its facilitation for advanced visual tasks through object detection experiments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"315 ","pages":"Article 113262"},"PeriodicalIF":7.2000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125003090","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Infrared and visible fusion has gained unprecedented attention due to its extensive applications in the field of computer vision. However, existing algorithms unilaterally focus on the fusion of clean scene images and are vulnerable to noise interference. Although this issue can be mitigated by deploying independent pre-denoising modules, the cascading of additional modules with diverse functionalities introduces supplementary complexity, computational overhead, and even inter-module interference. To overcome this limitation and achieve multitask unification, we propose a knowledge distillation framework for end-to-end simultaneous feature denoising and aggregation. In this framework, we leverage the advantages of the distillation architecture to generate soft labels, mitigating unstable fusion performance caused by lacking of label guidance. To achieve an accurate guidance for the function learning during the training process, an asymmetric noise-aware training strategy is devised for the facilitate of aggregation robustness and denoising capability. Moreover, to ensure the feature excavation and semantic complementary competence, a hybrid series–parallel CNN-transformer dual-branch En-Decoder is constructed. The proposed encoder incorporate the self-designed Textural-aware ConvNextV2, strip pooling attention and progressive residual transformer to compose the dual-branch architecture. In addition, the semantic complementary feature aggregation (SCFA) module are developed to realize a coarse-to-fine feature enhancement. Extensive experiments on both regular and noisy fusion materials are implemented to testify the integration and denoising performance of the proposed method. Notably, on the TNO dataset, the proposed method achieves improvements of 4% and 4.2% in the MSSIM and UQI metrics, respectively, compared to the second-best algorithm. Furthermore, we also investigate its facilitation for advanced visual tasks through object detection experiments.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.