Stealing Your Data from Compressed Machine Learning Models

Nuo Xu, Qi Liu, Tao Liu, Zihao Liu, Xiaochen Guo, Wujie Wen
{"title":"Stealing Your Data from Compressed Machine Learning Models","authors":"Nuo Xu, Qi Liu, Tao Liu, Zihao Liu, Xiaochen Guo, Wujie Wen","doi":"10.1109/DAC18072.2020.9218633","DOIUrl":null,"url":null,"abstract":"Machine learning models have been widely deployed in many real-world tasks. When a non-expert data holder wants to use a third-party machine learning service for model training, it is critical to preserve the confidentiality of the training data. In this paper, we for the first time explore the potential privacy leakage in a scenario that a malicious ML provider offers data holder customized training code including model compression which is essential in practical deployment The provider is unable to access the training process hosted by the secured third party, but could inquire models when they are released in public. As a result, adversary can extract sensitive training data with high quality even from these deeply compressed models that are tailored for resource-limited devices. Our investigation shows that existing compressions like quantization, can serve as a defense against such an attack, by degrading the model accuracy and memorized data quality simultaneously. To overcome this defense, we take an initial attempt to design a simple but stealthy quantized correlation encoding attack flow from an adversary perspective. Three integrated components-data pre-processing, layer-wise data-weight correlation regularization, data-aware quantization, are developed accordingly. Extensive experimental results show that our framework can preserve the evasiveness and effectiveness of stealing data from compressed models.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 57th ACM/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAC18072.2020.9218633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Machine learning models have been widely deployed in many real-world tasks. When a non-expert data holder wants to use a third-party machine learning service for model training, it is critical to preserve the confidentiality of the training data. In this paper, we for the first time explore the potential privacy leakage in a scenario that a malicious ML provider offers data holder customized training code including model compression which is essential in practical deployment The provider is unable to access the training process hosted by the secured third party, but could inquire models when they are released in public. As a result, adversary can extract sensitive training data with high quality even from these deeply compressed models that are tailored for resource-limited devices. Our investigation shows that existing compressions like quantization, can serve as a defense against such an attack, by degrading the model accuracy and memorized data quality simultaneously. To overcome this defense, we take an initial attempt to design a simple but stealthy quantized correlation encoding attack flow from an adversary perspective. Three integrated components-data pre-processing, layer-wise data-weight correlation regularization, data-aware quantization, are developed accordingly. Extensive experimental results show that our framework can preserve the evasiveness and effectiveness of stealing data from compressed models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从压缩机器学习模型中窃取数据
机器学习模型已经广泛应用于许多现实世界的任务中。当非专业数据持有者希望使用第三方机器学习服务进行模型训练时,保护训练数据的机密性至关重要。在本文中,我们首次探讨了恶意机器学习提供者提供数据持有者定制的训练代码(包括实际部署中必不可少的模型压缩)的潜在隐私泄露情况。提供者无法访问由安全第三方托管的训练过程,但可以在模型公开发布时查询模型。因此,对手甚至可以从这些为资源有限的设备量身定制的深度压缩模型中提取高质量的敏感训练数据。我们的调查表明,现有的压缩(如量化)可以通过同时降低模型精度和记忆数据质量来防御这种攻击。为了克服这种防御,我们首先尝试从对手的角度设计一个简单但隐蔽的量化相关编码攻击流。相应开发了数据预处理、分层数据权重关联正则化、数据感知量化三个集成组件。大量的实验结果表明,我们的框架可以保持从压缩模型中窃取数据的回避性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FCNNLib: An Efficient and Flexible Convolution Algorithm Library on FPGAs AXI HyperConnect: A Predictable, Hypervisor-level Interconnect for Hardware Accelerators in FPGA SoC Pythia: Intellectual Property Verification in Zero-Knowledge Reuse-trap: Re-purposing Cache Reuse Distance to Defend against Side Channel Leakage Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1