A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction

2023 IEEE Symposium on Security and Privacy (SP) Pub Date : 2023-05-01 DOI:10.1109/SP46215.2023.10179409

Hanshen Xiao, Zihang Xiang, Di Wang, S. Devadas

{"title":"A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction","authors":"Hanshen Xiao, Zihang Xiang, Di Wang, S. Devadas","doi":"10.1109/SP46215.2023.10179409","DOIUrl":null,"url":null,"abstract":"We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping also presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements to DP-SGD are heuristic, especially in the applications of private deep learning.In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation.Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with an (ϵ = 8, δ = 10−5) DP guarantee, we successfully train ResNet20 on CIFAR10 and SVHN with test accuracy 76.0% and 90.1%, respectively; for natural language processing, with (ϵ = 4, δ = 10−5), we successfully train a recurrent neural network on IMDb data with test accuracy 77.5%.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP46215.2023.10179409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping also presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements to DP-SGD are heuristic, especially in the applications of private deep learning.In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation.Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with an (ϵ = 8, δ = 10−5) DP guarantee, we successfully train ResNet20 on CIFAR10 and SVHN with test accuracy 76.0% and 90.1%, respectively; for natural language processing, with (ϵ = 4, δ = 10−5), we successfully train a recurrent neural network on IMDb data with test accuracy 77.5%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

我们研究了具有截断或归一化的每样本梯度的微分私有随机梯度下降(DP-SGD)中引入的偏差。作为一种最流行的人工操作，梯度裁剪可以对许多迭代优化方法进行复合隐私分析，而无需对学习模型或输入数据进行额外的假设。尽管梯度裁剪具有广泛的适用性，但在系统地指导私密性或实用性的改进方面也提出了理论挑战。一般来说，如果没有全局有界梯度的假设，经典的收敛分析不适用于裁剪梯度下降。此外，由于对效用损失的理解有限，对DP-SGD的许多现有改进都是启发式的，特别是在私有深度学习的应用中。在本文中，我们提供了有意义的理论分析，并得到了DP-SGD的实证结果的验证。我们指出，以前的工作低估了梯度裁剪引起的偏置。对于通过DP-SGD进行的一般非凸优化，我们表明导致偏差的一个关键因素是要剪切的随机梯度的采样噪声。因此，我们运用已发展的理论，从不同的角度对采样降噪进行了一系列的改进。从优化的角度研究方差缩减技术，提出内外动量。在学习模型(神经网络)层面，我们提出了几个技巧来增强网络内部归一化和BatchClipping，以仔细剪辑一批样本的梯度。对于数据预处理，我们通过数据规范化和(自)增强为最近提出的改进提供了理论依据。将这些系统改进结合在一起，通过DP-SGD进行的私人深度学习可以在许多任务中得到显着加强。例如，在计算机视觉应用中，在(ε = 8， δ = 10−5)DP保证下，我们成功地在CIFAR10和SVHN上训练ResNet20，测试准确率分别为76.0%和90.1%;对于自然语言处理，我们使用(ε = 4， δ = 10−5)在IMDb数据上成功训练了一个递归神经网络，测试准确率为77.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量

期刊最新文献

TeSec: Accurate Server-side Attack Investigation for Web Applications PLA-LiDAR: Physical Laser Attacks against LiDAR-based 3D Object Detection in Autonomous Vehicle One Key to Rule Them All: Secure Group Pairing for Heterogeneous IoT Devices SoK: Cryptographic Neural-Network Computation SoK: A Critical Evaluation of Efficient Website Fingerprinting Defenses