Zero-shot learning (ZSL) addresses the critical challenge of recognizing and classifying instances from categories not seen during training. Although generative model-based approaches have achieved notable success in ZSL, their predominant reliance on forward generation strategies coupled with excessive dependence on auxiliary information hampers model generalization and robustness. To overcome these limitations, we propose a Mixed-Semantic Enhancement framework inspired by interpolation-based feature extraction. This novel approach is designed to synthesize enriched auxiliary information through integrating authentic semantic cues, thereby refining the mapping from semantic descriptions to visual features. The enhanced feature synthesis capability enables better discrimination of ambiguous classes while preserving inter-class relationships. In addition, we establish bidirectional alignment between visual features and auxiliary information. This cross-modal interaction mechanism not only strengthens the generator’s training process through feature consistency constraints but also facilitates dynamic information exchange between modalities. Extensive experiments in a transductive setting across four benchmark datasets demonstrate significant performance gains, highlighting the robustness and effectiveness of our approach in advancing generative ZSL models.
扫码关注我们
求助内容:
应助结果提醒方式:
