高效沟通和拜占庭鲁棒分布式学习

2020 Information Theory and Applications Workshop (ITA) Pub Date : 2019-11-21 DOI:10.1109/ITA50056.2020.9245017

Avishek Ghosh, R. Maity, S. Kadhe, A. Mazumdar, K. Ramchandran

{"title":"高效沟通和拜占庭鲁棒分布式学习","authors":"Avishek Ghosh, R. Maity, S. Kadhe, A. Mazumdar, K. Ramchandran","doi":"10.1109/ITA50056.2020.9245017","DOIUrl":null,"url":null,"abstract":"We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δ-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Communication-Efficient and Byzantine-Robust Distributed Learning\",\"authors\":\"Avishek Ghosh, R. Maity, S. Kadhe, A. Mazumdar, K. Ramchandran\",\"doi\":\"10.1109/ITA50056.2020.9245017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δ-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.\",\"PeriodicalId\":137257,\"journal\":{\"name\":\"2020 Information Theory and Applications Workshop (ITA)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Information Theory and Applications Workshop (ITA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITA50056.2020.9245017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Information Theory and Applications Workshop (ITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITA50056.2020.9245017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

我们开发了一种通信高效的分布式学习算法，该算法对拜占庭工作机器具有鲁棒性。我们提出并分析了一种分布式梯度下降算法，该算法基于梯度规范执行简单的阈值，以减轻拜占庭故障。我们表明，我们的算法的(统计)错误率与[YCKB18]相匹配，后者使用更复杂的方案(如坐标明智的中位数或修剪的平均值)，因此是最优的。此外，为了提高通信效率，我们考虑了来自[KRSJ19]的一类通用δ-近似压缩器，它包含基于符号的压缩器和top-k稀疏化。我们的算法分别使用压缩梯度和梯度规范进行聚合和拜占庭去除。我们建立了任意(凸或非凸)平滑损失函数的统计错误率。研究表明，在压缩因子δ一定且参数空间维数一定的情况下，压缩操作不影响收敛速度，从而有效地实现了免费压缩。此外，我们扩展了[KRSJ19]中提出的带有误差反馈的压缩梯度下降算法，用于分布式设置。我们已经通过实验验证了我们的结果，并在凸(最小二乘回归)和非凸(神经网络训练)问题的收敛方面显示出良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Communication-Efficient and Byzantine-Robust Distributed Learning

We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δ-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Information Theory and Applications Workshop (ITA)

自引率

0.00%

发文量

期刊最新文献

Massive MIMO is Very Useful for Pilot-Free Uplink Communications Simplified Ray Tracing for the Millimeter Wave Channel: A Performance Evaluation On Marton's Achievable Region: Local Tensorization for Product Channels with a Binary Component Improve Robustness of Deep Neural Networks by Coding On Nonnegative CP Tensor Decomposition Robustness to Noise