{"title":"基于共识的高维回归并行统一算法与组合正则化","authors":"Xiaofei Wu , Rongmei Liang , Zhimin Zhang , Zhenyu Cui","doi":"10.1016/j.csda.2024.108081","DOIUrl":null,"url":null,"abstract":"<div><div>The parallel algorithm is widely recognized for its effectiveness in handling large-scale datasets stored in a distributed manner, making it a popular choice for solving statistical learning models. However, there is currently limited research on parallel algorithms specifically designed for high-dimensional regression with combined regularization terms. These terms, such as elastic-net, sparse group lasso, sparse fused lasso, and their nonconvex variants, have gained significant attention in various fields due to their ability to incorporate prior information and promote sparsity within specific groups or fused variables. The scarcity of parallel algorithms for combined regularizations can be attributed to the inherent nonsmoothness and complexity of these terms, as well as the absence of closed-form solutions for certain proximal operators associated with them. This paper proposes a <em>unified</em> constrained optimization formulation based on the consensus problem for these types of convex and nonconvex regression problems, and derives the corresponding parallel alternating direction method of multipliers (ADMM) algorithms. Furthermore, it is proven that the proposed algorithm not only has global convergence but also exhibits a linear convergence rate. It is worth noting that the computational complexity of the proposed algorithm remains the same for different regularization terms and losses, which implicitly demonstrates the universality of this algorithm. Extensive simulation experiments, along with a financial example, serve to demonstrate the reliability, stability, and scalability of our algorithm. The R package for implementing the proposed algorithm can be obtained at <span><span>https://github.com/xfwu1016/CPADMM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108081"},"PeriodicalIF":1.5000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A unified consensus-based parallel algorithm for high-dimensional regression with combined regularizations\",\"authors\":\"Xiaofei Wu , Rongmei Liang , Zhimin Zhang , Zhenyu Cui\",\"doi\":\"10.1016/j.csda.2024.108081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The parallel algorithm is widely recognized for its effectiveness in handling large-scale datasets stored in a distributed manner, making it a popular choice for solving statistical learning models. However, there is currently limited research on parallel algorithms specifically designed for high-dimensional regression with combined regularization terms. These terms, such as elastic-net, sparse group lasso, sparse fused lasso, and their nonconvex variants, have gained significant attention in various fields due to their ability to incorporate prior information and promote sparsity within specific groups or fused variables. The scarcity of parallel algorithms for combined regularizations can be attributed to the inherent nonsmoothness and complexity of these terms, as well as the absence of closed-form solutions for certain proximal operators associated with them. This paper proposes a <em>unified</em> constrained optimization formulation based on the consensus problem for these types of convex and nonconvex regression problems, and derives the corresponding parallel alternating direction method of multipliers (ADMM) algorithms. Furthermore, it is proven that the proposed algorithm not only has global convergence but also exhibits a linear convergence rate. It is worth noting that the computational complexity of the proposed algorithm remains the same for different regularization terms and losses, which implicitly demonstrates the universality of this algorithm. Extensive simulation experiments, along with a financial example, serve to demonstrate the reliability, stability, and scalability of our algorithm. The R package for implementing the proposed algorithm can be obtained at <span><span>https://github.com/xfwu1016/CPADMM</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55225,\"journal\":{\"name\":\"Computational Statistics & Data Analysis\",\"volume\":\"203 \",\"pages\":\"Article 108081\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Statistics & Data Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167947324001658\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947324001658","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
并行算法在处理以分布式方式存储的大规模数据集方面的有效性已得到广泛认可,因此成为解决统计学习模型的热门选择。然而,目前专门针对具有组合正则化条款的高维回归而设计的并行算法的研究还很有限。这些术语,如 elastic-net、sparse group lasso、sparse fused lasso 及其非凸变体,由于能够在特定组或融合变量内纳入先验信息并促进稀疏性,在各个领域都获得了极大的关注。组合正则化并行算法的匮乏可归因于这些术语固有的非平稳性和复杂性,以及与之相关的某些近似算子缺乏闭式解。本文针对这些类型的凸回归和非凸回归问题,提出了基于共识问题的统一约束优化公式,并推导出相应的并行交替乘法(ADMM)算法。此外,还证明了所提出的算法不仅具有全局收敛性,而且还表现出线性收敛率。值得注意的是,对于不同的正则化项和损失,所提算法的计算复杂度保持不变,这隐含地证明了该算法的通用性。大量的模拟实验以及一个财务实例证明了我们算法的可靠性、稳定性和可扩展性。实现该算法的 R 软件包可从 https://github.com/xfwu1016/CPADMM 获取。
A unified consensus-based parallel algorithm for high-dimensional regression with combined regularizations
The parallel algorithm is widely recognized for its effectiveness in handling large-scale datasets stored in a distributed manner, making it a popular choice for solving statistical learning models. However, there is currently limited research on parallel algorithms specifically designed for high-dimensional regression with combined regularization terms. These terms, such as elastic-net, sparse group lasso, sparse fused lasso, and their nonconvex variants, have gained significant attention in various fields due to their ability to incorporate prior information and promote sparsity within specific groups or fused variables. The scarcity of parallel algorithms for combined regularizations can be attributed to the inherent nonsmoothness and complexity of these terms, as well as the absence of closed-form solutions for certain proximal operators associated with them. This paper proposes a unified constrained optimization formulation based on the consensus problem for these types of convex and nonconvex regression problems, and derives the corresponding parallel alternating direction method of multipliers (ADMM) algorithms. Furthermore, it is proven that the proposed algorithm not only has global convergence but also exhibits a linear convergence rate. It is worth noting that the computational complexity of the proposed algorithm remains the same for different regularization terms and losses, which implicitly demonstrates the universality of this algorithm. Extensive simulation experiments, along with a financial example, serve to demonstrate the reliability, stability, and scalability of our algorithm. The R package for implementing the proposed algorithm can be obtained at https://github.com/xfwu1016/CPADMM.
期刊介绍:
Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas:
I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article.
II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures.
[...]
III) Special Applications - [...]
IV) Annals of Statistical Data Science [...]