In recent years, image generation techniques based on diffusion models have made significant progress in the field of facial stylization. However, existing methods still face challenges in achieving high identity fidelity while maintaining strong stylistic expressiveness, particularly in balancing the geometric deformations introduced by stylization with the preservation of fine facial features (such as facial features and poses). To address this issue, this paper proposes a novel single-sample facial stylization system—OIDSty. Its core innovation lies in decoupling identity preservation and style injection tasks across distinct attention layers, primarily achieved through two key designs: (1) High-Fidelity Identity Module, which innovatively combines strong semantic conditions and weak spatial conditions to guide cross-attention layers. This design enables precise retention of core identity and facial layout features while permitting stylized geometric deformations; (2) The DINO-Style Texture Guidance Module introduces this loss function into the self-attention layer to compute the feature difference between the ideal stylized output and the current output. This loss is integrated into the denoising sampling process, dynamically calibrating latent features through gradients to ensure efficient and accurate transfer of stylized textures onto the target image. Extensive experimental results demonstrate that OIDSty generates high-fidelity, stylistically distinct images across multiple styles. Compared to existing state-of-the-art methods, our method exhibits significant advantages across all objective and subjective evaluation metrics without requiring complex parameter tuning.
扫码关注我们
求助内容:
应助结果提醒方式:
