ModelShield: Adaptive and Robust Watermark Against Model Extraction Attack

IF 8 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2025-01-16 DOI:10.1109/TIFS.2025.3530691
Kaiyi Pang;Tao Qi;Chuhan Wu;Minhao Bai;Minghu Jiang;Yongfeng Huang
{"title":"ModelShield: Adaptive and Robust Watermark Against Model Extraction Attack","authors":"Kaiyi Pang;Tao Qi;Chuhan Wu;Minhao Bai;Minghu Jiang;Yongfeng Huang","doi":"10.1109/TIFS.2025.3530691","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks, thereby enhancing the commercial value of their intellectual property (IP). To protect this IP, model owners typically allow user access only in a black-box manner, however, adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content. However, existing watermarking methods often compromise the quality of generated content due to heuristic alterations and lack robust mechanisms to counteract adversarial strategies, thus limiting their practicality in real-world scenarios. In this paper, we introduce an adaptive and robust watermarking method (named ModelShield) to protect the IP of LLMs. Our method incorporates a self-watermarking mechanism that allows LLMs to autonomously insert watermarks into their generated content to avoid the degradation of model content. We also propose a robust watermark detection mechanism capable of effectively identifying watermark signals under the interference of varying adversarial strategies. Besides, ModelShield is a plug-and-play method that does not require additional model training, enhancing its applicability in LLM deployments. Extensive evaluations on two real-world datasets and three LLMs demonstrate that our method surpasses existing methods in terms of defense effectiveness and robustness while significantly reducing the degradation of watermarking on the model-generated content.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1767-1782"},"PeriodicalIF":8.0000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843740/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks, thereby enhancing the commercial value of their intellectual property (IP). To protect this IP, model owners typically allow user access only in a black-box manner, however, adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content. However, existing watermarking methods often compromise the quality of generated content due to heuristic alterations and lack robust mechanisms to counteract adversarial strategies, thus limiting their practicality in real-world scenarios. In this paper, we introduce an adaptive and robust watermarking method (named ModelShield) to protect the IP of LLMs. Our method incorporates a self-watermarking mechanism that allows LLMs to autonomously insert watermarks into their generated content to avoid the degradation of model content. We also propose a robust watermark detection mechanism capable of effectively identifying watermark signals under the interference of varying adversarial strategies. Besides, ModelShield is a plug-and-play method that does not require additional model training, enhancing its applicability in LLM deployments. Extensive evaluations on two real-world datasets and three LLMs demonstrate that our method surpasses existing methods in terms of defense effectiveness and robustness while significantly reducing the degradation of watermarking on the model-generated content.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ModelShield:抗模型提取攻击的自适应鲁棒水印
大型语言模型(llm)在各种机器学习任务中展示了通用智能,从而提高了其知识产权(IP)的商业价值。为了保护这个IP,模型所有者通常只允许用户以黑盒方式访问,然而,攻击者仍然可以利用模型提取攻击来窃取模型生成中编码的模型智能。水印技术通过将唯一标识符嵌入到模型生成的内容中,为防御此类攻击提供了一种很有前途的解决方案。然而,现有的水印方法往往会由于启发式更改而损害生成内容的质量,并且缺乏对抗对抗策略的强大机制,从而限制了它们在现实场景中的实用性。本文引入了一种自适应鲁棒水印方法(ModelShield)来保护llm的IP。我们的方法结合了一种自水印机制,允许llm自动将水印插入其生成的内容中,以避免模型内容的退化。我们还提出了一种鲁棒的水印检测机制,能够在不同对抗策略的干扰下有效识别水印信号。此外,ModelShield是一种即插即用的方法,不需要额外的模型训练,增强了其在LLM部署中的适用性。对两个真实世界数据集和三个llm的广泛评估表明,我们的方法在防御有效性和鲁棒性方面优于现有方法,同时显著降低了模型生成内容上水印的退化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
期刊最新文献
Downlink Control Information Sniffing Based Smart Jamming and Its Suppression Strategy in 5G NR Differentially Private Event-Triggered Average Consensus for Multi-Agent Systems under f -Local Byzantine Attacks: An Improved Resilient Protocol Leveraging Confidence Consistency for Poisoned Sample Detection in Vertical Federated Learning Transferable Adversarial Attack on Referring Video Object Segmentation PriLabel: Towards Comprehensively Uncovering Omitted Disclosures in Privacy Labels of Android Apps on a Large Scale
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1