Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm

IF 1.6 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computational Statistics & Data Analysis Pub Date : 2025-07-01 Epub Date: 2025-02-10 DOI:10.1016/j.csda.2025.108146

Alexander C. McLain , Anja Zgodic , Howard Bondell

{"title":"Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm","authors":"Alexander C. McLain , Anja Zgodic , Howard Bondell","doi":"10.1016/j.csda.2025.108146","DOIUrl":null,"url":null,"abstract":"<div><div>Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum <em>a posteriori</em> (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package <span>probe</span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108146"},"PeriodicalIF":1.6000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325000222","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/10 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum a posteriori (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package probe.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于分割经验贝叶斯ECM算法的高效稀疏高维线性回归

贝叶斯变量选择方法是拟合稀疏高维线性回归模型的有效方法。然而，许多模型需要大量的计算，或者需要模型参数的限制性先验分布。提出了一种计算效率高、功能强大的稀疏高维线性回归贝叶斯方法，该方法通过对超参数的插入式经验贝叶斯估计，只需要对参数进行最小的先验假设。该方法采用参数扩展期望条件最大化（PX-ECM）算法，通过计算效率高的坐标优化来估计参数的最大后验值（MAP）。流行的两组多重测试方法激发了E-step，产生了用于稀疏高维线性回归的PaRtitiOned empirical Bayes Ecm （PROBE）算法。一次性优化和一次性优化都可以用于完成PROBE。对癌细胞药物反应进行了广泛的模拟研究和分析，以比较PROBE的经验性质与相关方法的经验性质。通过R包探测可以实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Statistics & Data Analysis 数学-计算机：跨学科应用

CiteScore

3.70

自引率

5.60%

发文量

167

审稿时长

60 days

期刊介绍： Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]

期刊最新文献

Random irregular histograms Reduced-bias whittle likelihood estimation for short- and long-memory processes Dirichlet process multi-state mixture models Robust Bayesian high-dimensional variable selection and inference with the horseshoe family of priors Adaptive accelerated failure time modeling with a semiparametric skewed error distribution