PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

arXiv - CS - Cryptography and Security Pub Date : 2024-09-18 DOI:arxiv-2409.12072

Yukai Xu, Yujie Gu, Kouichi Sakurai

{"title":"PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning","authors":"Yukai Xu, Yujie Gu, Kouichi Sakurai","doi":"arxiv-2409.12072","DOIUrl":null,"url":null,"abstract":"Backdoor attacks pose a significant threat to deep neural networks,\nparticularly as recent advancements have led to increasingly subtle\nimplantation, making the defense more challenging. Existing defense mechanisms\ntypically rely on an additional clean dataset as a standard reference and\ninvolve retraining an auxiliary model or fine-tuning the entire victim model.\nHowever, these approaches are often computationally expensive and not always\nfeasible in practical applications. In this paper, we propose a novel and\nlightweight defense mechanism, termed PAD-FT, that does not require an\nadditional clean dataset and fine-tunes only a very small part of the model to\ndisinfect the victim model. To achieve this, our approach first introduces a\nsimple data purification process to identify and select the most-likely clean\ndata from the poisoned training dataset. The self-purified clean dataset is\nthen used for activation clipping and fine-tuning only the last classification\nlayer of the victim model. By integrating data purification, activation\nclipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates\nsuperior effectiveness across multiple backdoor attack methods and datasets, as\nconfirmed through extensive experimental evaluation.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Backdoor attacks pose a significant threat to deep neural networks, particularly as recent advancements have led to increasingly subtle implantation, making the defense more challenging. Existing defense mechanisms typically rely on an additional clean dataset as a standard reference and involve retraining an auxiliary model or fine-tuning the entire victim model. However, these approaches are often computationally expensive and not always feasible in practical applications. In this paper, we propose a novel and lightweight defense mechanism, termed PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model. To achieve this, our approach first introduces a simple data purification process to identify and select the most-likely clean data from the poisoned training dataset. The self-purified clean dataset is then used for activation clipping and fine-tuning only the last classification layer of the victim model. By integrating data purification, activation clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates superior effectiveness across multiple backdoor attack methods and datasets, as confirmed through extensive experimental evaluation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PAD-FT：通过数据净化和微调实现对后门攻击的轻量级防御

后门攻击对深度神经网络构成了重大威胁，尤其是最近的技术进步导致了越来越微妙的植入，使得防御更具挑战性。现有的防御机制通常依赖于额外的干净数据集作为标准参考，并涉及重新训练辅助模型或微调整个受害者模型。然而，这些方法通常计算成本高昂，在实际应用中并不总是可行的。在本文中，我们提出了一种称为 PAD-FT 的新型轻量级防御机制，它不需要额外的干净数据集，只需对模型的一小部分进行微调，即可感染受害者模型。为了实现这一目标，我们的方法首先引入了一个简单的数据净化过程，从中毒训练数据集中识别并选择最有可能的干净数据。然后，自我净化的干净数据集仅用于受害者模型最后一个分类层的激活削波和微调。通过整合数据净化、激活剪切和分类器微调，我们的机制 PAD-FT 在多种后门攻击方法和数据集上都表现出了更高的有效性，这一点已经通过广泛的实验评估得到了证实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Cryptography and Security

自引率

0.00%

发文量