BayesSUR:一个用于线性回归中高维多元贝叶斯变量和协方差选择的R包

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Statistical Software Pub Date : 2021-04-28 DOI:10.18637/jss.v100.i11

Zhi Zhao, Marco Banterle, L. Bottolo, S. Richardson, A. Lewin, M. Zucknick

{"title":"BayesSUR:一个用于线性回归中高维多元贝叶斯变量和协方差选择的R包","authors":"Zhi Zhao, Marco Banterle, L. Bottolo, S. Richardson, A. Lewin, M. Zucknick","doi":"10.18637/jss.v100.i11","DOIUrl":null,"url":null,"abstract":"In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with highdimensional genomic and other omics data, a problem that can be studied with highdimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. Here, we also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"54 1","pages":""},"PeriodicalIF":8.1000,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"BayesSUR: An R Package for High-Dimensional Multivariate Bayesian Variable and Covariance Selection in Linear Regression\",\"authors\":\"Zhi Zhao, Marco Banterle, L. Bottolo, S. Richardson, A. Lewin, M. Zucknick\",\"doi\":\"10.18637/jss.v100.i11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with highdimensional genomic and other omics data, a problem that can be studied with highdimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. Here, we also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.\",\"PeriodicalId\":17237,\"journal\":{\"name\":\"Journal of Statistical Software\",\"volume\":\"54 1\",\"pages\":\"\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2021-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.18637/jss.v100.i11\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Software","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.18637/jss.v100.i11","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 8

摘要

在分子生物学中，高通量技术的进步使得研究复杂的多变量表型及其与高维基因组和其他组学数据的同时关联成为可能，这一问题可以通过高维多响应回归来研究，其中响应变量可能高度相关。为此，我们最近介绍了几种多变量贝叶斯变量和协方差选择模型，例如用于变量和协方差选择的稀疏看似不相关回归的贝叶斯估计方法。在此背景下，已经实现了几个变量选择先验，特别是潜在变量包含指标的热点检测先验，这导致预测因子与多种表型之间关联的变量选择稀疏。在这里，我们还提出了一种替代方法，该方法使用马尔可夫随机场(MRF)先验来结合关于包含指标依赖结构的先验知识。通过对响应变量间的协方差矩阵进行因式分解，使马尔可夫链蒙特卡罗方法对贝叶斯似不相关回归(SUR)的推断在计算上可行。在本文中，我们介绍了BayesSUR，一个R包，它允许用户轻松地指定和运行一系列不同的贝叶斯SUR模型，这些模型已在c++中实现，以提高计算效率。R包允许以模块化的方式规范模型，其中用户分别选择变量选择和协方差选择的先验。我们以典型应用为例，在代表eQTL或mQTL研究和体外抗癌药物筛选研究的合成和真实数据集上，展示了具有热点先验和峰板MRF先验的稀疏SUR模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BayesSUR: An R Package for High-Dimensional Multivariate Bayesian Variable and Covariance Selection in Linear Regression

In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with highdimensional genomic and other omics data, a problem that can be studied with highdimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. Here, we also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Statistical Software 工程技术-计算机：跨学科应用

CiteScore

10.70

自引率

1.70%

发文量

审稿时长

6-12 weeks

期刊介绍： The Journal of Statistical Software (JSS) publishes open-source software and corresponding reproducible articles discussing all aspects of the design, implementation, documentation, application, evaluation, comparison, maintainance and distribution of software dedicated to improvement of state-of-the-art in statistical computing in all areas of empirical research. Open-source code and articles are jointly reviewed and published in this journal and should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics.

期刊最新文献

BoXHED2.0: Scalable Boosting of Dynamic Survival Analysis. Optimum Allocation for Adaptive Multi-Wave Sampling in R: The R Package optimall. spsurvey: Spatial Sampling Design and Analysis in R. Application of Equal Local Levels to Improve Q-Q Plot Testing Bands with R Package qqconf. Elastic Net Regularization Paths for All Generalized Linear Models.