TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang
{"title":"TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors","authors":"Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang","doi":"arxiv-2409.05294","DOIUrl":null,"url":null,"abstract":"Diffusion models have achieved notable success in image generation, but they\nremain highly vulnerable to backdoor attacks, which compromise their integrity\nby producing specific undesirable outputs when presented with a pre-defined\ntrigger. In this paper, we investigate how to protect diffusion models from\nthis dangerous threat. Specifically, we propose TERD, a backdoor defense\nframework that builds unified modeling for current attacks, which enables us to\nderive an accessible reversed loss. A trigger reversion strategy is further\nemployed: an initial approximation of the trigger through noise sampled from a\nprior distribution, followed by refinement through differential multi-step\nsamplers. Additionally, with the reversed trigger, we propose backdoor\ndetection from the noise space, introducing the first backdoor input detection\napproach for diffusion models and a novel model detection algorithm that\ncalculates the KL divergence between reversed and benign distributions.\nExtensive evaluations demonstrate that TERD secures a 100% True Positive Rate\n(TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD\nalso demonstrates nice adaptability to other Stochastic Differential Equation\n(SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that builds unified modeling for current attacks, which enables us to derive an accessible reversed loss. A trigger reversion strategy is further employed: an initial approximation of the trigger through noise sampled from a prior distribution, followed by refinement through differential multi-step samplers. Additionally, with the reversed trigger, we propose backdoor detection from the noise space, introducing the first backdoor input detection approach for diffusion models and a novel model detection algorithm that calculates the KL divergence between reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD also demonstrates nice adaptability to other Stochastic Differential Equation (SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TERD:防范扩散模型后门的统一框架
扩散模型在图像生成方面取得了显著的成就,但它们仍然极易受到后门攻击的影响,这种攻击会在出现预定义触发时产生特定的不良输出,从而破坏其完整性。在本文中,我们研究了如何保护扩散模型免受这种危险威胁。具体来说,我们提出了 TERD--一种后门防御框架,它为当前的攻击建立了统一的模型,使我们能够预测可访问的反向损失。此外,我们还采用了一种触发器还原策略:通过从先前分布中采样的噪声对触发器进行初始近似,然后通过差分多步采样器进行细化。此外,利用反向触发器,我们提出了从噪声空间进行后门输入检测的方法,为扩散模型引入了第一种后门输入检测方法,以及一种计算反向分布和良性分布之间 KL 发散的新型模型检测算法。TERD 还能很好地适应其他基于随机微分方程(SDE)的模型。我们的代码见 https://github.com/PKU-ML/TERD。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning Artemis: Efficient Commit-and-Prove SNARKs for zkML A Survey-Based Quantitative Analysis of Stress Factors and Their Impacts Among Cybersecurity Professionals Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction Practical Investigation on the Distinguishability of Longa's Atomic Patterns
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1