A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction

Hanshen Xiao, Zihang Xiang, Di Wang, S. Devadas
{"title":"A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction","authors":"Hanshen Xiao, Zihang Xiang, Di Wang, S. Devadas","doi":"10.1109/SP46215.2023.10179409","DOIUrl":null,"url":null,"abstract":"We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping also presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements to DP-SGD are heuristic, especially in the applications of private deep learning.In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation.Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with an (ϵ = 8, δ = 10−5) DP guarantee, we successfully train ResNet20 on CIFAR10 and SVHN with test accuracy 76.0% and 90.1%, respectively; for natural language processing, with (ϵ = 4, δ = 10−5), we successfully train a recurrent neural network on IMDb data with test accuracy 77.5%.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP46215.2023.10179409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping also presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements to DP-SGD are heuristic, especially in the applications of private deep learning.In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation.Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with an (ϵ = 8, δ = 10−5) DP guarantee, we successfully train ResNet20 on CIFAR10 and SVHN with test accuracy 76.0% and 90.1%, respectively; for natural language processing, with (ϵ = 4, δ = 10−5), we successfully train a recurrent neural network on IMDb data with test accuracy 77.5%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
我们研究了具有截断或归一化的每样本梯度的微分私有随机梯度下降(DP-SGD)中引入的偏差。作为一种最流行的人工操作,梯度裁剪可以对许多迭代优化方法进行复合隐私分析,而无需对学习模型或输入数据进行额外的假设。尽管梯度裁剪具有广泛的适用性,但在系统地指导私密性或实用性的改进方面也提出了理论挑战。一般来说,如果没有全局有界梯度的假设,经典的收敛分析不适用于裁剪梯度下降。此外,由于对效用损失的理解有限,对DP-SGD的许多现有改进都是启发式的,特别是在私有深度学习的应用中。在本文中,我们提供了有意义的理论分析,并得到了DP-SGD的实证结果的验证。我们指出,以前的工作低估了梯度裁剪引起的偏置。对于通过DP-SGD进行的一般非凸优化,我们表明导致偏差的一个关键因素是要剪切的随机梯度的采样噪声。因此,我们运用已发展的理论,从不同的角度对采样降噪进行了一系列的改进。从优化的角度研究方差缩减技术,提出内外动量。在学习模型(神经网络)层面,我们提出了几个技巧来增强网络内部归一化和BatchClipping,以仔细剪辑一批样本的梯度。对于数据预处理,我们通过数据规范化和(自)增强为最近提出的改进提供了理论依据。将这些系统改进结合在一起,通过DP-SGD进行的私人深度学习可以在许多任务中得到显着加强。例如,在计算机视觉应用中,在(ε = 8, δ = 10−5)DP保证下,我们成功地在CIFAR10和SVHN上训练ResNet20,测试准确率分别为76.0%和90.1%;对于自然语言处理,我们使用(ε = 4, δ = 10−5)在IMDb数据上成功训练了一个递归神经网络,测试准确率为77.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
TeSec: Accurate Server-side Attack Investigation for Web Applications PLA-LiDAR: Physical Laser Attacks against LiDAR-based 3D Object Detection in Autonomous Vehicle One Key to Rule Them All: Secure Group Pairing for Heterogeneous IoT Devices SoK: Cryptographic Neural-Network Computation SoK: A Critical Evaluation of Efficient Website Fingerprinting Defenses
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1