Machine Unlearning allows participants to remove their data from a trained machine learning model in order to preserve their privacy, and security. However, the machine unlearning literature for generative models is rather limited. The literature for image-to-image generative model (I2I model) considers minimizing the loss between Gaussian noise and the output of I2I model for forget samples as machine unlearning. However, we argue that the machine learning model performs fairly well on unseen data i.e., a retrained model will be able to learn generalized representations in the data and hence will not generate an output which is Gaussian noise instead. In this paper, we consider that the model after unlearning should treat forget samples as out-of-distribution (OOD) data, i.e., the unlearned model should no longer recognize or encode the specific patterns found in the forget samples. To achieve this, we propose a framework which decouples the model parameters with gradient ascent. Our framework ensures that forget samples are OOD for unlearned model with theoretical guarantee. We also provide (ϵ, δ)-unlearning guarantee for model updates with gradient ascent. The unlearned model is further fine-tuned on the remaining samples to maintain its performance. We also propose a data poisoning attack model as an auditing mechanism in order to make sure that the unlearned model has effectively removed the influence of forget samples. Furthermore, we demonstrate that even under sample unlearning, our approach prevents backdoor regeneration, validating its effectiveness. Extensive empirical evaluation on two large-scale datasets, ImageNet-1K and Places365 highlights the superiority of our approach. To show comparable performance with a retrained model, we also show the comparison of a simple AutoEncoder on various baselines on CIFAR-10 dataset.
扫码关注我们
求助内容:
应助结果提醒方式:
