A Partition-insensitive Parallel Framework for Distributed Model Fitting

arXiv - STAT - Computation Pub Date : 2024-06-02 DOI:arxiv-2406.00703

Xiaofei Wu, Rongmei Liang, Fabio Roli, Marcello Pelillo, Jing Yuan

{"title":"A Partition-insensitive Parallel Framework for Distributed Model Fitting","authors":"Xiaofei Wu, Rongmei Liang, Fabio Roli, Marcello Pelillo, Jing Yuan","doi":"arxiv-2406.00703","DOIUrl":null,"url":null,"abstract":"Distributed model fitting refers to the process of fitting a mathematical or\nstatistical model to the data using distributed computing resources, such that\ncomputing tasks are divided among multiple interconnected computers or nodes,\noften organized in a cluster or network. Most of the existing methods for\ndistributed model fitting are to formulate it in a consensus optimization\nproblem, and then build up algorithms based on the alternating direction method\nof multipliers (ADMM). This paper introduces a novel parallel framework for\nachieving a distributed model fitting. In contrast to previous consensus\nframeworks, the introduced parallel framework offers two notable advantages.\nFirstly, it exhibits insensitivity to sample partitioning, meaning that the\nsolution of the algorithm remains unaffected by variations in the number of\nslave nodes or/and the amount of data each node carries. Secondly, fewer\nvariables are required to be updated at each iteration, so that the proposed\nparallel framework performs in a more succinct and efficient way, and adapts to\nhigh-dimensional data. In addition, we prove that the algorithms under the new\nparallel framework have a worst-case linear convergence rate in theory.\nNumerical experiments confirm the generality, robustness, and accuracy of our\nproposed parallel framework.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.00703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often organized in a cluster or network. Most of the existing methods for distributed model fitting are to formulate it in a consensus optimization problem, and then build up algorithms based on the alternating direction method of multipliers (ADMM). This paper introduces a novel parallel framework for achieving a distributed model fitting. In contrast to previous consensus frameworks, the introduced parallel framework offers two notable advantages. Firstly, it exhibits insensitivity to sample partitioning, meaning that the solution of the algorithm remains unaffected by variations in the number of slave nodes or/and the amount of data each node carries. Secondly, fewer variables are required to be updated at each iteration, so that the proposed parallel framework performs in a more succinct and efficient way, and adapts to high-dimensional data. In addition, we prove that the algorithms under the new parallel framework have a worst-case linear convergence rate in theory. Numerical experiments confirm the generality, robustness, and accuracy of our proposed parallel framework.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分布式模型拟合的分区敏感并行框架

分布式模型拟合是指利用分布式计算资源对数学或统计模型进行数据拟合的过程，即计算任务在多台相互连接的计算机或节点之间进行分配，通常以集群或网络的形式组织。现有的分布式模型拟合方法大多是将其表述为一个共识优化问题，然后建立基于交替方向乘法（ADMM）的算法。本文介绍了一种实现分布式模型拟合的新型并行框架。与以往的共识框架相比，本文介绍的并行框架有两个显著优势：首先，它对样本分割不敏感，这意味着算法的求解不受从节点数或/和每个节点携带的数据量变化的影响。其次，每次迭代需要更新的变量很少，因此所提出的并行框架能以更简洁、更高效的方式运行，并适应高维数据。此外，我们还证明了新并行框架下的算法在理论上具有最坏情况下的线性收敛率。数值实验证实了我们提出的并行框架的通用性、鲁棒性和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - STAT - Computation

自引率

0.00%

发文量