Current generative steganography techniques have attracted considerable attention due to their security. However, different platforms and social environments exhibit varying preferred modalities, and existing generative steganography techniques are often restricted to a single modality. Inspired by advancements in inpainting techniques, we observe that the inpainting process is inherently generative. Moreover, cross-modal inpainting minimally perturbs unchanged regions and shares a consistent masking-and-fill procedure. Based on these insights, we introduce StegaFusion, a novel framework for unifying multimodal generative steganography. StegaFusion leverages shared generation seeds and conditional information, which enables the receiver to deterministically reconstruct the reference content. The receiver then performs differential analysis on the inpainting-generated stego content to extract the secret message. Compared to traditional unimodal methods, StegaFusion enhances controllability, security, compatibility, and interpretability without requiring additional model training. To the best of our knowledge, StegaFusion is the first framework to formalize and unify cross-modal generative steganography, offering wide applicability. Extensive qualitative and quantitative experiments demonstrate the superior performance of StegaFusion in terms of controllability, security, and cross-modal compatibility.
扫码关注我们
求助内容:
应助结果提醒方式:
