On the Privacy of Noisy Stochastic Gradient Descent for Convex Optimization

IF 1.2 3区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS SIAM Journal on Computing Pub Date : 2024-07-19 DOI:10.1137/23m1556538
Jason M. Altschuler, Jinho Bok, Kunal Talwar
{"title":"On the Privacy of Noisy Stochastic Gradient Descent for Convex Optimization","authors":"Jason M. Altschuler, Jinho Bok, Kunal Talwar","doi":"10.1137/23m1556538","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Computing, Volume 53, Issue 4, Page 969-1001, August 2024. <br/> Abstract. A central issue in machine learning is how to train models on sensitive user data. Industry has widely adopted a simple algorithm: Stochastic Gradient Descent (SGD) with noise (a.k.a. Stochastic Gradient Langevin Dynamics). However, foundational theoretical questions about this algorithm’s privacy loss remain open—even in the seemingly simple setting of smooth convex losses over a bounded domain. Our main result resolves these questions: for a large range of parameters, we characterize the differential privacy up to a constant factor. This result reveals that all previous analyses for this setting have the wrong qualitative behavior. Specifically, while previous privacy analyses increase ad infinitum in the number of iterations, we show that after a small burn-in period, running SGD longer leaks no further privacy. Our analysis departs from previous approaches based on fast mixing, instead using techniques based on optimal transport (namely, Privacy Amplification by Iteration) and the Sampled Gaussian Mechanism (namely, Privacy Amplification by Sampling). Our techniques readily extend to other settings, e.g., strongly convex losses, nonuniform stepsizes, arbitrary batch sizes, and random or cyclic choice of batches.","PeriodicalId":49532,"journal":{"name":"SIAM Journal on Computing","volume":"18 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1137/23m1556538","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

SIAM Journal on Computing, Volume 53, Issue 4, Page 969-1001, August 2024.
Abstract. A central issue in machine learning is how to train models on sensitive user data. Industry has widely adopted a simple algorithm: Stochastic Gradient Descent (SGD) with noise (a.k.a. Stochastic Gradient Langevin Dynamics). However, foundational theoretical questions about this algorithm’s privacy loss remain open—even in the seemingly simple setting of smooth convex losses over a bounded domain. Our main result resolves these questions: for a large range of parameters, we characterize the differential privacy up to a constant factor. This result reveals that all previous analyses for this setting have the wrong qualitative behavior. Specifically, while previous privacy analyses increase ad infinitum in the number of iterations, we show that after a small burn-in period, running SGD longer leaks no further privacy. Our analysis departs from previous approaches based on fast mixing, instead using techniques based on optimal transport (namely, Privacy Amplification by Iteration) and the Sampled Gaussian Mechanism (namely, Privacy Amplification by Sampling). Our techniques readily extend to other settings, e.g., strongly convex losses, nonuniform stepsizes, arbitrary batch sizes, and random or cyclic choice of batches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
论凸优化的噪声随机梯度下降的隐私性
SIAM 计算期刊》,第 53 卷第 4 期,第 969-1001 页,2024 年 8 月。 摘要机器学习的一个核心问题是如何在敏感用户数据上训练模型。业界广泛采用了一种简单的算法:带噪声的随机梯度下降算法(SGD)(又称随机梯度朗文动力学)。然而,关于这种算法的隐私损失的基础理论问题仍未解决--即使是在有界域上的光滑凸损失这一看似简单的设置中。我们的主要结果解决了这些问题:对于大范围的参数,我们描述了差分隐私性的常数因子。这一结果揭示出,之前所有针对这种设置的分析都有错误的定性行为。具体来说,以前的隐私分析会随着迭代次数的增加而无限增加,而我们的分析表明,在经过一小段时间的磨合期后,再运行 SGD 就不会泄露更多隐私了。我们的分析不同于以往基于快速混合的方法,而是采用了基于最优传输(即迭代隐私放大)和采样高斯机制(即采样隐私放大)的技术。我们的技术很容易扩展到其他设置,例如强凸损失、非均匀步长、任意批次大小以及批次的随机或循环选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
SIAM Journal on Computing
SIAM Journal on Computing 工程技术-计算机:理论方法
CiteScore
4.60
自引率
0.00%
发文量
68
审稿时长
6-12 weeks
期刊介绍: The SIAM Journal on Computing aims to provide coverage of the most significant work going on in the mathematical and formal aspects of computer science and nonnumerical computing. Submissions must be clearly written and make a significant technical contribution. Topics include but are not limited to analysis and design of algorithms, algorithmic game theory, data structures, computational complexity, computational algebra, computational aspects of combinatorics and graph theory, computational biology, computational geometry, computational robotics, the mathematical aspects of programming languages, artificial intelligence, computational learning, databases, information retrieval, cryptography, networks, distributed computing, parallel algorithms, and computer architecture.
期刊最新文献
Optimal Resizable Arrays Stronger 3-SUM Lower Bounds for Approximate Distance Oracles via Additive Combinatorics Resolving Matrix Spencer Conjecture up to Poly-Logarithmic Rank Complexity Classification Transfer for CSPs via Algebraic Products Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1