{"title":"Privacy-Preserving Frank-Wolfe on Shuffle Model","authors":"Ling-jie Zhang, Shi-song Wu, Hai Zhang","doi":"10.1007/s10255-024-1095-6","DOIUrl":null,"url":null,"abstract":"<div><p>In this paper, we design the differentially private variants of the classical Frank-Wolfe algorithm with shuffle model in the optimization of machine learning. Under weak assumptions and the generalized linear loss (GLL) structure, we propose a noisy Frank-Wolfe with shuffle model algorithm (NoisyFWS) and a noisy variance-reduced Frank-Wolfe with the shuffle model algorithm (NoisyVRFWS) by adding calibrated laplace noise under shuffling scheme in the <i>ℓ</i><sub><i>p</i></sub>(<i>p</i> ∈ [1, 2])-case, and study their privacy as well as utility guarantees for the Hölder smoothness GLL. In particular, the privacy guarantees are mainly achieved by using advanced composition and privacy amplification by shuffling. The utility bounds of the NoisyFWS and NoisyVRFWS are analyzed and obtained the optimal excess population risks <span>\\({\\cal O}({n^{ - {{1 + \\alpha } \\over {4\\alpha }}}} + {{\\log (d)\\sqrt {\\log ({1 \\mathord{\\left/ {\\vphantom {1 \\delta }} \\right.} \\delta })} } \\over {n\\epsilon\\,}})\\)</span> and <span>\\({\\cal O}({n^{ - {{1 + \\alpha } \\over {4\\alpha }}}} + {{\\log (d)\\sqrt {\\log ({1 \\mathord{\\left/ {\\vphantom {1 \\delta }} \\right.} \\delta })} } \\over {{n^2\\epsilon}\\,}})\\)</span> with gradient complexity <span>\\({\\cal O}({n^{ - {{{{(1 + \\alpha )}^2}} \\over {4{\\alpha ^2}}}}})\\)</span> for <span>\\(\\alpha \\in \\left[ {{1 \\mathord{\\left/ {\\vphantom {1 {\\sqrt 3 ,\\,1}}} \\right.} {\\sqrt 3 ,\\,1}}} \\right]\\)</span>. It turns out that the risk rates under shuffling scheme are a nearly-dimension independent rate, which is consistent with the previous work in some cases. In addition, there is a vital tradeoff between (<i>α, L</i>)-Hölder smoothness GLL and the gradient complexity. The linear gradient complexity <span>\\({\\cal O}(n)\\)</span> is showed by the parameter <i>α</i> = 1.</p></div>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10255-024-1095-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we design the differentially private variants of the classical Frank-Wolfe algorithm with shuffle model in the optimization of machine learning. Under weak assumptions and the generalized linear loss (GLL) structure, we propose a noisy Frank-Wolfe with shuffle model algorithm (NoisyFWS) and a noisy variance-reduced Frank-Wolfe with the shuffle model algorithm (NoisyVRFWS) by adding calibrated laplace noise under shuffling scheme in the ℓp(p ∈ [1, 2])-case, and study their privacy as well as utility guarantees for the Hölder smoothness GLL. In particular, the privacy guarantees are mainly achieved by using advanced composition and privacy amplification by shuffling. The utility bounds of the NoisyFWS and NoisyVRFWS are analyzed and obtained the optimal excess population risks \({\cal O}({n^{ - {{1 + \alpha } \over {4\alpha }}}} + {{\log (d)\sqrt {\log ({1 \mathord{\left/ {\vphantom {1 \delta }} \right.} \delta })} } \over {n\epsilon\,}})\) and \({\cal O}({n^{ - {{1 + \alpha } \over {4\alpha }}}} + {{\log (d)\sqrt {\log ({1 \mathord{\left/ {\vphantom {1 \delta }} \right.} \delta })} } \over {{n^2\epsilon}\,}})\) with gradient complexity \({\cal O}({n^{ - {{{{(1 + \alpha )}^2}} \over {4{\alpha ^2}}}}})\) for \(\alpha \in \left[ {{1 \mathord{\left/ {\vphantom {1 {\sqrt 3 ,\,1}}} \right.} {\sqrt 3 ,\,1}}} \right]\). It turns out that the risk rates under shuffling scheme are a nearly-dimension independent rate, which is consistent with the previous work in some cases. In addition, there is a vital tradeoff between (α, L)-Hölder smoothness GLL and the gradient complexity. The linear gradient complexity \({\cal O}(n)\) is showed by the parameter α = 1.