Diffusion model (DM) has shown great promise in image inpainting by modeling complex data distributions and generating high-quality reconstructions. However, current diffusion-based methods often face challenges such as excessive iterative steps and limited adaptability to both local and global features, resulting in high computational costs and suboptimal restoration quality. To address these issues, we propose Diff-GDAformer, a novel image inpainting framework that combines diffusion-based prior feature generation with guided dynamic attention Transformer (GDAformer) for robust and efficient restoration. In our approach, the DM iteratively refines Gaussian noise in a compressed latent space to generate high-quality prior features, which guide the restoration process. These prior features are injected into GDAformer, which innovatively adopts a dynamic recursive local attention (DRLA) module. DRLA makes use of two complementary attention mechanisms: guided local self-attention (GL-SA) and guided recursive-generalized self-attention (GRG-SA). GL-SA enhances the model’s ability to capture fine-grained local details, while GRG-SA focuses on aggregating global contextual information efficiently. To bridge the gap between local and global features, we introduce the hybrid feature integration (HFI) module, which effectively fuses features from different attention layers, enabling a more comprehensive understanding of image contexts. The two-stage training strategy combines GDAformer with DM optimization, ensuring that the extracted prior features are accurate and seamlessly integrated into the restoration pipeline. Extensive experiments demonstrate that Diff-GDAformer achieves state-of-the-art performance on standard benchmarks, delivering superior visual quality and computational efficiency compared to existing methods. https://github.com/w1zzzzzWu/Diff-GDAformer.
扫码关注我们
求助内容:
应助结果提醒方式:
