Image fusion integrates complementary features from source images to enhance human and machine vision. Existing methods face two key limitations: (1) prioritizing visual quality over semantic representation, limiting downstream task performance, and (2) relying on spatial domain features, neglecting high-frequency details like textures and edges. To address these, we propose SFIFusion, a task-oriented network for semantic-frequency feature fusion, specifically for infrared and visible images. SFIFusion incorporates a Semantic Enhancement Block (SEB) for deep semantic feature extraction, aligned with visual details via DINOv2 to ensure semantic consistency. The enriched semantic features are subsequently incorporated back into the fusion process, ensuring that final fused image is both visually refined and semantically robust. It also introduces a Frequency Enhancement Block (FEB), using Fourier transform to decompose images into amplitude (texture/style) and phase (structural details), preserving amplitude for visual richness and combining phase for structural integrity. Experiments show SFIFusion outperforms current methods in visual quality, quantitative metrics, and downstream tasks like object detection and semantic segmentation, demonstrating its practical applicability in complex scenarios. The source code is available at https://github.com/Zzuouo/SFIFusion.
扫码关注我们
求助内容:
应助结果提醒方式:
