Not one but many Tradeoffs: Privacy Vs. Utility in Differentially Private Machine Learning

Benjamin Zi Hao Zhao, M. Kâafar, N. Kourtellis
{"title":"Not one but many Tradeoffs: Privacy Vs. Utility in Differentially Private Machine Learning","authors":"Benjamin Zi Hao Zhao, M. Kâafar, N. Kourtellis","doi":"10.1145/3411495.3421352","DOIUrl":null,"url":null,"abstract":"Data holders are increasingly seeking to protect their user's privacy, whilst still maximizing their ability to produce machine learning (ML) models with high quality predictions. In this work, we empirically evaluate various implementations of differential privacy (DP), and measure their ability to fend off real-world privacy attacks, in addition to measuring their core goal of providing accurate classifications. We establish an evaluation framework to ensure each of these implementations are fairly evaluated. Our selection of DP implementations add DP noise at different positions within the framework, either at the point of data collection/release, during updates while training of the model, or after training by perturbing learned model parameters. We evaluate each implementation across a range of privacy budgets and datasets, each implementation providing the same mathematical privacy guarantees. By measuring the models' resistance to real world attacks of membership and attribute inference, and their classification accuracy. we determine which implementations provide the most desirable tradeoff between privacy and utility. We found that the number of classes of a given dataset is unlikely to influence where the privacy and utility tradeoff occurs, a counter-intuitive inference in contrast to the known relationship of increased privacy vulnerability in datasets with more classes. Additionally, in the scenario that high privacy constraints are required, perturbing input training data before applying ML modeling does not trade off as much utility, as compared to noise added later in the ML process.","PeriodicalId":125943,"journal":{"name":"Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3411495.3421352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Data holders are increasingly seeking to protect their user's privacy, whilst still maximizing their ability to produce machine learning (ML) models with high quality predictions. In this work, we empirically evaluate various implementations of differential privacy (DP), and measure their ability to fend off real-world privacy attacks, in addition to measuring their core goal of providing accurate classifications. We establish an evaluation framework to ensure each of these implementations are fairly evaluated. Our selection of DP implementations add DP noise at different positions within the framework, either at the point of data collection/release, during updates while training of the model, or after training by perturbing learned model parameters. We evaluate each implementation across a range of privacy budgets and datasets, each implementation providing the same mathematical privacy guarantees. By measuring the models' resistance to real world attacks of membership and attribute inference, and their classification accuracy. we determine which implementations provide the most desirable tradeoff between privacy and utility. We found that the number of classes of a given dataset is unlikely to influence where the privacy and utility tradeoff occurs, a counter-intuitive inference in contrast to the known relationship of increased privacy vulnerability in datasets with more classes. Additionally, in the scenario that high privacy constraints are required, perturbing input training data before applying ML modeling does not trade off as much utility, as compared to noise added later in the ML process.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不是一个,而是许多权衡:隐私与效用在不同的私人机器学习
数据持有者越来越多地寻求保护用户隐私,同时仍然最大限度地提高他们生成具有高质量预测的机器学习(ML)模型的能力。在这项工作中,我们对差分隐私(DP)的各种实现进行了实证评估,并测量了它们抵御现实世界隐私攻击的能力,此外还测量了它们提供准确分类的核心目标。我们建立了一个评估框架,以确保每个实现都得到公平的评估。我们选择的DP实现在框架内的不同位置添加DP噪声,无论是在数据收集/发布点,在模型训练时更新期间,还是在训练后通过扰动学习的模型参数。我们在一系列隐私预算和数据集上评估每种实现,每种实现都提供相同的数学隐私保证。通过测量模型对隶属度和属性推理攻击的抵抗能力,以及模型的分类精度。我们确定哪些实现在隐私和实用之间提供了最理想的权衡。我们发现,给定数据集的类数量不太可能影响隐私和效用权衡发生的位置,这与已知的类更多的数据集中隐私漏洞增加的关系相反,这是一个反直觉的推断。此外,在需要高度隐私约束的情况下,在应用机器学习建模之前干扰输入训练数据,与稍后在机器学习过程中添加的噪声相比,并没有太多的效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MARTINI: Memory Access Traces to Detect Attacks Securing Classifiers Against Both White-Box and Black-Box Attacks using Encrypted-Input Obfuscation GANRED: GAN-based Reverse Engineering of DNNs via Cache Side-Channel Towards Enabling Secure Web-Based Cloud Services using Client-Side Encryption Non-Interactive Cryptographic Access Control for Secure Outsourced Storage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1