Learning Robust to Distributional Uncertainties and Adversarial Data

Alireza Sadeghi;Gang Wang;Georgios B. Giannakis
{"title":"Learning Robust to Distributional Uncertainties and Adversarial Data","authors":"Alireza Sadeghi;Gang Wang;Georgios B. Giannakis","doi":"10.1109/JSAIT.2024.3381869","DOIUrl":null,"url":null,"abstract":"Successful training of data-intensive deep neural networks critically rely on vast, clean, and high-quality datasets. In practice however, their reliability diminishes, particularly with noisy, outlier-corrupted data samples encountered in testing. This challenge intensifies when dealing with anonymized, heterogeneous data sets stored across geographically distinct locations due to, e.g., privacy concerns. This present paper introduces robust learning frameworks tailored for centralized and federated learning scenarios. Our goal is to fortify model resilience with a focus that lies in (i) addressing distribution shifts from training to inference time; and, (ii) ensuring test-time robustness, when a trained model may encounter outliers or adversarially contaminated test data samples. To this aim, we start with a centralized setting where the true data distribution is considered unknown, but residing within a Wasserstein ball centered at the empirical distribution. We obtain robust models by minimizing the worst-case expected loss within this ball, yielding an intractable infinite-dimensional optimization problem. Upon leverage the strong duality condition, we arrive at a tractable surrogate learning problem. We develop two stochastic primal-dual algorithms to solve the resultant problem: one for \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n-accurate convex sub-problems and another for a single gradient ascent step. We further develop a distributionally robust federated learning framework to learn robust model using heterogeneous data sets stored at distinct locations by solving per-learner’s sub-problems locally, offering robustness with modest computational overhead and considering data distribution. Numerical tests corroborate merits of our training algorithms against distributional uncertainties and adversarially corrupted test data samples.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"105-122"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10479184/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Successful training of data-intensive deep neural networks critically rely on vast, clean, and high-quality datasets. In practice however, their reliability diminishes, particularly with noisy, outlier-corrupted data samples encountered in testing. This challenge intensifies when dealing with anonymized, heterogeneous data sets stored across geographically distinct locations due to, e.g., privacy concerns. This present paper introduces robust learning frameworks tailored for centralized and federated learning scenarios. Our goal is to fortify model resilience with a focus that lies in (i) addressing distribution shifts from training to inference time; and, (ii) ensuring test-time robustness, when a trained model may encounter outliers or adversarially contaminated test data samples. To this aim, we start with a centralized setting where the true data distribution is considered unknown, but residing within a Wasserstein ball centered at the empirical distribution. We obtain robust models by minimizing the worst-case expected loss within this ball, yielding an intractable infinite-dimensional optimization problem. Upon leverage the strong duality condition, we arrive at a tractable surrogate learning problem. We develop two stochastic primal-dual algorithms to solve the resultant problem: one for $\epsilon $ -accurate convex sub-problems and another for a single gradient ascent step. We further develop a distributionally robust federated learning framework to learn robust model using heterogeneous data sets stored at distinct locations by solving per-learner’s sub-problems locally, offering robustness with modest computational overhead and considering data distribution. Numerical tests corroborate merits of our training algorithms against distributional uncertainties and adversarially corrupted test data samples.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
适应分布不确定性和对抗性数据的鲁棒学习
数据密集型深度神经网络的成功训练主要依赖于庞大、干净和高质量的数据集。然而,在实践中,数据集的可靠性会降低,尤其是在测试中遇到噪声大、异常值被破坏的数据样本时。出于隐私等方面的考虑,在处理存储在不同地理位置的匿名异构数据集时,这一挑战会更加严峻。本文介绍了为集中式和联合式学习场景量身定制的稳健学习框架。我们的目标是加强模型的弹性,重点在于:(i) 解决从训练到推理时间的分布转移问题;(ii) 确保测试时间的鲁棒性,此时训练好的模型可能会遇到异常值或受到逆向污染的测试数据样本。为此,我们从集中化设置入手,在这种设置中,真实数据分布被认为是未知的,但位于以经验分布为中心的瓦瑟斯坦球内。我们通过最小化这个球内的最坏情况预期损失来获得稳健模型,这就产生了一个难以解决的无限维优化问题。利用强二元性条件,我们得到了一个简单易行的代理学习问题。我们开发了两种随机初等二元算法来解决由此产生的问题:一种是针对 $\epsilon $ 精确凸子问题的算法,另一种是针对单一梯度上升步骤的算法。我们进一步开发了一种分布稳健的联合学习框架,通过在本地解决每个学习者的子问题,使用存储在不同位置的异构数据集学习稳健模型,在考虑数据分布的情况下,以适度的计算开销提供稳健性。数值测试证实了我们的训练算法在应对分布不确定性和对抗性破坏测试数据样本方面的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.20
自引率
0.00%
发文量
0
期刊最新文献
Source Coding for Markov Sources With Partial Memoryless Side Information at the Decoder Deviation From Maximal Entanglement for Mid-Spectrum Eigenstates of Local Hamiltonians Statistical Inference With Limited Memory: A Survey Tightening Continuity Bounds for Entropies and Bounds on Quantum Capacities Dynamic Group Testing to Control and Monitor Disease Progression in a Population
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1