Correcting the distribution of batch normalization signals for Trojan mitigation

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-11-07 DOI:10.1016/j.neucom.2024.128752
Xi Li , Zhen Xiang , David J. Miller , George Kesidis
{"title":"Correcting the distribution of batch normalization signals for Trojan mitigation","authors":"Xi Li ,&nbsp;Zhen Xiang ,&nbsp;David J. Miller ,&nbsp;George Kesidis","doi":"10.1016/j.neucom.2024.128752","DOIUrl":null,"url":null,"abstract":"<div><div>Backdoor (Trojan) attacks represent a significant adversarial threat to deep neural networks (DNNs). In such attacks, the presence of an attacker’s backdoor trigger causes a test instance to be misclassified into the attacker’s chosen target class. Post-training mitigation methods aim to rectify these misclassifications, ensuring that poisoned models correctly classify backdoor-triggered samples. These methods require the defender to have access to a small, clean dataset and the potentially compromised DNN. However, most defenses rely on parameter fine-tuning, making their effectiveness dependent on the dataset size available to the defender. To overcome the limitations of existing approaches, we propose a method that rectifies misclassifications by correcting the altered distribution of internal layer activations of backdoor-triggered instances. Distribution alterations are corrected by applying simple transformations to internal activations. Notably, our method does not modify any trainable parameters of the DNN, yet it achieves generally good mitigation performance against various backdoor attacks and benchmarks. Consequently, our approach demonstrates robustness even with a limited amount of clean data, making it highly practical for real-world applications. The effectiveness of our approach is validated through both theoretical analysis and extensive experimentation. The appendix is provided as an electronic component and can be accessed via the link in the footnote.<span><span><sup>2</sup></span></span> The source codes can be found in the link<span><span><sup>3</sup></span></span> at the footnote.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128752"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015236","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Backdoor (Trojan) attacks represent a significant adversarial threat to deep neural networks (DNNs). In such attacks, the presence of an attacker’s backdoor trigger causes a test instance to be misclassified into the attacker’s chosen target class. Post-training mitigation methods aim to rectify these misclassifications, ensuring that poisoned models correctly classify backdoor-triggered samples. These methods require the defender to have access to a small, clean dataset and the potentially compromised DNN. However, most defenses rely on parameter fine-tuning, making their effectiveness dependent on the dataset size available to the defender. To overcome the limitations of existing approaches, we propose a method that rectifies misclassifications by correcting the altered distribution of internal layer activations of backdoor-triggered instances. Distribution alterations are corrected by applying simple transformations to internal activations. Notably, our method does not modify any trainable parameters of the DNN, yet it achieves generally good mitigation performance against various backdoor attacks and benchmarks. Consequently, our approach demonstrates robustness even with a limited amount of clean data, making it highly practical for real-world applications. The effectiveness of our approach is validated through both theoretical analysis and extensive experimentation. The appendix is provided as an electronic component and can be accessed via the link in the footnote.2 The source codes can be found in the link3 at the footnote.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
校正批量归一化信号的分布以减轻木马危害
后门(木马)攻击是对深度神经网络(DNN)的一个重大威胁。在此类攻击中,攻击者后门触发器的存在会导致测试实例被错误分类为攻击者选择的目标类别。训练后缓解方法旨在纠正这些错误分类,确保中毒模型能正确分类后门触发的样本。这些方法要求防御者能够访问小型、干净的数据集和可能被破坏的 DNN。然而,大多数防御方法都依赖于参数微调,因此其有效性取决于防御者可用的数据集大小。为了克服现有方法的局限性,我们提出了一种方法,通过纠正后门触发实例内部层激活分布的改变来纠正错误分类。通过对内部激活进行简单的转换,就能纠正分布的改变。值得注意的是,我们的方法没有修改 DNN 的任何可训练参数,但却在各种后门攻击和基准测试中取得了普遍良好的缓解性能。因此,我们的方法即使在清洁数据量有限的情况下也表现出了鲁棒性,使其在现实世界的应用中非常实用。我们的方法的有效性通过理论分析和大量实验得到了验证。附录以电子版形式提供,可通过脚注中的链接访问2 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
Editorial Board Extending the learning using privileged information paradigm to logistic regression DoA-ViT: Dual-objective Affine Vision Transformer for Data Insufficiency CNN explanation methods for ordinal regression tasks Superpixel semantics representation and pre-training for vision–language tasks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1