Avoiding Communication in Logistic Regression

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-11-16 DOI:10.1109/HiPC50609.2020.00023

Aditya Devarakonda, J. Demmel

引用次数: 0

Abstract

Stochastic gradient descent (SGD) is one of the most widely used optimization methods for solving various machine learning problems. SGD solves an optimization problem by iteratively sampling a few data points from the input data, computing gradients for the selected data points, and updating the solution. However, in a parallel setting, SGD requires interprocess communication at every iteration. We introduce a new communication-avoiding technique for solving the logistic regression problem using SGD. This technique re-organizes the SGD computations into a form that communicates every $s$ iterations instead of every iteration, where $s$ is a tuning parameter. We prove theoretical flops, bandwidth, and latency upper bounds for SGD and its new communication-avoiding variant. Furthermore, we show experimental results that illustrate that the new Communication-Avoiding SGD (CA-SGD) method can achieve speedups of up to 4.97× on a high-performance Infiniband cluster without altering the convergence behavior or accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

逻辑回归中的避免沟通

随机梯度下降(SGD)是解决各种机器学习问题的最广泛使用的优化方法之一。SGD通过从输入数据中迭代地采样一些数据点，计算所选数据点的梯度，并更新解决方案来解决优化问题。然而，在并行设置中，SGD需要在每次迭代中进行进程间通信。我们引入了一种新的通信避免技术来解决使用SGD的逻辑回归问题。该技术将SGD计算重新组织为一种形式，该形式将每次$s$迭代而不是每次迭代进行通信，其中$s$是一个调优参数。我们证明了SGD及其新的通信避免变体的理论失败、带宽和延迟上限。此外，我们展示的实验结果表明，新的通信避免SGD (CA-SGD)方法可以在高性能Infiniband集群上实现高达4.97倍的加速，而不会改变收敛行为或精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

自引率

0.00%

发文量

期刊最新文献

HiPC 2020 ORGANIZATION HiPC 2020 Industry Sponsors PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks Algorithms for Preemptive Co-scheduling of Kernels on GPUs 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program