Large language models (LLMs) excel in diverse tasks but face deployment challenges due to their massive size. One-shot pruning reduces computational costs by introducing parameter sparsity, yet pruned models often suffer from performance degradation, necessitating fine-tuning. Existing fine-tuning methods for sparse models, like DSØT [1], use heuristic algorithms to update sparsity masks. These approaches employ approximation strategies without training, potentially leading to suboptimal outcomes. Moreover, during continual task fine-tuning, the accumulation of mask updates can result in catastrophic forgetting as new updates overwrite previous configurations. To address these issues, we propose Group-shared Continual Learning (GCL), a fine-tuning framework specifically designed for sparse LLMs. GCL updates model weights through training rather than modifying sparsity masks, thereby preserving sparsity while avoiding suboptimal solutions. The framework utilizes dependency-aware row-column optimization parameters and a group-wise sharing strategy, achieving a balance between performance and efficiency. Additionally, to mitigate catastrophic forgetting, we model parameter regularization as bio-inspired synaptic plasticity, deriving gradient-aware constraints via Taylor-expanded errors. Compared to other methods based on Hessian matrices [2], our approach reduces computational complexity from O(N2) to O(N). GCL is compatible with diverse sparsity configurations, including unstructured and N:M formats, and seamlessly integrates with existing pruning techniques. Experimental evaluations on LLaMA-V1/V2 models demonstrate that GCL outperforms prior methods in performance recovery and stability across continual tasks while preserving model sparsity.
扫码关注我们
求助内容:
应助结果提醒方式:
