PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

Yukai Xu, Yujie Gu, Kouichi Sakurai
{"title":"PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning","authors":"Yukai Xu, Yujie Gu, Kouichi Sakurai","doi":"arxiv-2409.12072","DOIUrl":null,"url":null,"abstract":"Backdoor attacks pose a significant threat to deep neural networks,\nparticularly as recent advancements have led to increasingly subtle\nimplantation, making the defense more challenging. Existing defense mechanisms\ntypically rely on an additional clean dataset as a standard reference and\ninvolve retraining an auxiliary model or fine-tuning the entire victim model.\nHowever, these approaches are often computationally expensive and not always\nfeasible in practical applications. In this paper, we propose a novel and\nlightweight defense mechanism, termed PAD-FT, that does not require an\nadditional clean dataset and fine-tunes only a very small part of the model to\ndisinfect the victim model. To achieve this, our approach first introduces a\nsimple data purification process to identify and select the most-likely clean\ndata from the poisoned training dataset. The self-purified clean dataset is\nthen used for activation clipping and fine-tuning only the last classification\nlayer of the victim model. By integrating data purification, activation\nclipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates\nsuperior effectiveness across multiple backdoor attack methods and datasets, as\nconfirmed through extensive experimental evaluation.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Backdoor attacks pose a significant threat to deep neural networks, particularly as recent advancements have led to increasingly subtle implantation, making the defense more challenging. Existing defense mechanisms typically rely on an additional clean dataset as a standard reference and involve retraining an auxiliary model or fine-tuning the entire victim model. However, these approaches are often computationally expensive and not always feasible in practical applications. In this paper, we propose a novel and lightweight defense mechanism, termed PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model. To achieve this, our approach first introduces a simple data purification process to identify and select the most-likely clean data from the poisoned training dataset. The self-purified clean dataset is then used for activation clipping and fine-tuning only the last classification layer of the victim model. By integrating data purification, activation clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates superior effectiveness across multiple backdoor attack methods and datasets, as confirmed through extensive experimental evaluation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PAD-FT:通过数据净化和微调实现对后门攻击的轻量级防御
后门攻击对深度神经网络构成了重大威胁,尤其是最近的技术进步导致了越来越微妙的植入,使得防御更具挑战性。现有的防御机制通常依赖于额外的干净数据集作为标准参考,并涉及重新训练辅助模型或微调整个受害者模型。然而,这些方法通常计算成本高昂,在实际应用中并不总是可行的。在本文中,我们提出了一种称为 PAD-FT 的新型轻量级防御机制,它不需要额外的干净数据集,只需对模型的一小部分进行微调,即可感染受害者模型。为了实现这一目标,我们的方法首先引入了一个简单的数据净化过程,从中毒训练数据集中识别并选择最有可能的干净数据。然后,自我净化的干净数据集仅用于受害者模型最后一个分类层的激活削波和微调。通过整合数据净化、激活剪切和分类器微调,我们的机制 PAD-FT 在多种后门攻击方法和数据集上都表现出了更高的有效性,这一点已经通过广泛的实验评估得到了证实。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning Artemis: Efficient Commit-and-Prove SNARKs for zkML A Survey-Based Quantitative Analysis of Stress Factors and Their Impacts Among Cybersecurity Professionals Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction Practical Investigation on the Distinguishability of Longa's Atomic Patterns
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1