With the proliferation of large language models (LLMs), knowledge distillation (KD) has emerged as a promising methodology to address the challenges of large model sizes and high computational costs in real-world applications. However, existing KD methods often ignore the difficulty variations within the datasets used for distillation, leading to inefficient resource allocation and suboptimal training outcomes. In this paper, we propose Difficulty-Aware and Adaptive Distillation (D2A2), an efficient and performance-enhancing distillation framework. The key idea is to incorporate the inherent difficulty of problems, as indicated by the uncertainty of LLMs shown when making decisions about the ultimate predictions, into the distillation process. Specifically, we integrate this into data filtering and model training phases to enhance the effectiveness of distillation. In difficulty-aware data filtering phase, we prioritize difficult samples for further distillation based on their semantic uncertainty. In difficulty-adaptive training phase, we dynamically adjust the focus on challenging samples by updating the distillation loss based on the student model’s performance. Comprehensive experiments demonstrate our framework outperforms existing methods with fewer data and exhibits versatile performance across various models and datasets.
扫码关注我们
求助内容:
应助结果提醒方式:
