The R package skpr provides a suite of functions to generate and evaluate experimental designs. Package skpr generates D, I, Alias, A, E, T, and G-optimal designs, and supports custom user-defined optimality criteria, N-level split-plot designs, mixture designs, and design augmentation. Also included are a collection of analytic and Monte Carlo power evaluation functions for normal, non-normal, random effects, and survival models, as well as tools to plot fraction of design space plots and correlation maps. Additionally, skpr includes a flexible framework for the user to perform custom power analyses with external libraries and user-defined functions, as well as a graphical user interface that wraps most of the functionality of the package in a point-and-click web application.
R包skpr提供了一套功能来生成和评估实验设计。skpr包生成D, I, Alias, A, E, T和g最优设计,并支持自定义用户定义的最优性标准,n级分割图设计,混合设计和设计增强。还包括用于正态、非正态、随机效应和生存模型的分析和蒙特卡罗功率评估函数的集合,以及用于绘制设计空间图和相关图的工具。此外,skpr还包括一个灵活的框架,供用户使用外部库和用户定义的函数执行自定义功率分析,以及一个图形用户界面,该界面将软件包的大部分功能封装在一个点击式web应用程序中。
{"title":"Optimal Design Generation and Power Evaluation in R: The skpr Package","authors":"T. Morgan-Wall, George C. Khoury","doi":"10.18637/jss.v099.i01","DOIUrl":"https://doi.org/10.18637/jss.v099.i01","url":null,"abstract":"The R package skpr provides a suite of functions to generate and evaluate experimental designs. Package skpr generates D, I, Alias, A, E, T, and G-optimal designs, and supports custom user-defined optimality criteria, N-level split-plot designs, mixture designs, and design augmentation. Also included are a collection of analytic and Monte Carlo power evaluation functions for normal, non-normal, random effects, and survival models, as well as tools to plot fraction of design space plots and correlation maps. Additionally, skpr includes a flexible framework for the user to perform custom power analyses with external libraries and user-defined functions, as well as a graphical user interface that wraps most of the functionality of the package in a point-and-click web application.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"4 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85314254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the R package cold for the analysis of count longitudinal data. In this package marginal and random effects models are considered. In both cases estimation is via maximization of the exact likelihood and serial dependence among observations is assumed to be of Markovian type and referred as the integer-valued autoregressive of order one process. For random effects models adaptive Gaussian quadrature and Monte Carlo methods are used to compute integrals whose dimension depends on the structure of random effects. cold is written partly in R language, partly in Fortran 77, interfaced through R and is built following the S4 formulation of R methods.
{"title":"cold: An R Package for the Analysis of Count Longitudinal Data","authors":"M. H. Gonçalves, M. S. Cabral","doi":"10.18637/jss.v099.i03","DOIUrl":"https://doi.org/10.18637/jss.v099.i03","url":null,"abstract":"This paper describes the R package cold for the analysis of count longitudinal data. In this package marginal and random effects models are considered. In both cases estimation is via maximization of the exact likelihood and serial dependence among observations is assumed to be of Markovian type and referred as the integer-valued autoregressive of order one process. For random effects models adaptive Gaussian quadrature and Monte Carlo methods are used to compute integrals whose dimension depends on the structure of random effects. cold is written partly in R language, partly in Fortran 77, interfaced through R and is built following the S4 formulation of R methods.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"5 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84793587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Nordhausen, M. Matilainen, J. Miettinen, Joni Virta, S. Taskinen
Multivariate time series observations are increasingly common in multiple fields of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by first reducing the dimension of the series and then modelling the resulting uncorrelated signals univariately, avoiding the need for any covariance parameters. A popular and effective framework for this is blind source separation. In this paper we review the dimension reduction tools for time series available in the R package tsBSS. These include methods for estimating the signal dimension of second-order stationary time series, dimension reduction techniques for stochastic volatility models and supervised dimension reduction tools for time series regression. Several examples are provided to illustrate the functionality of the package.
{"title":"Dimension Reduction for Time Series in a Blind Source Separation Context Using R","authors":"K. Nordhausen, M. Matilainen, J. Miettinen, Joni Virta, S. Taskinen","doi":"10.18637/jss.v098.i15","DOIUrl":"https://doi.org/10.18637/jss.v098.i15","url":null,"abstract":"Multivariate time series observations are increasingly common in multiple fields of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by first reducing the dimension of the series and then modelling the resulting uncorrelated signals univariately, avoiding the need for any covariance parameters. A popular and effective framework for this is blind source separation. In this paper we review the dimension reduction tools for time series available in the R package tsBSS. These include methods for estimating the signal dimension of second-order stationary time series, dimension reduction techniques for stochastic volatility models and supervised dimension reduction tools for time series regression. Several examples are provided to illustrate the functionality of the package.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"94 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80687877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01Epub Date: 2021-07-10DOI: 10.18637/jss.v098.i12
Tyler Grimes, Somnath Datta
Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces in silico RNA-seq data for benchmarking and assessing gene network inference methods. The package is available on CRAN and on GitHub at https://github.com/tgrimes/SeqNet.
{"title":"SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data.","authors":"Tyler Grimes, Somnath Datta","doi":"10.18637/jss.v098.i12","DOIUrl":"10.18637/jss.v098.i12","url":null,"abstract":"<p><p>Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces <i>in silico</i> RNA-seq data for benchmarking and assessing gene network inference methods. The package is available on CRAN and on GitHub at https://github.com/tgrimes/SeqNet.</p>","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"98 12","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8315007/pdf/nihms-1647401.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39254580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Kunzmann, Maximilian Pilz, Carolin Herrmann, G. Rauch, M. Kieser
Even though adaptive two-stage designs with unblinded interim analyses are becoming increasingly popular in clinical trial designs, there is a lack of statistical software to make their application more straightforward. The package adoptr fills this gap for the common case of two-stage one- or two-arm trials with (approximately) normally distributed outcomes. In contrast to previous approaches, adoptr optimizes the entire design upfront which allows maximal efficiency. To facilitate experimentation with different objective functions, adoptr supports a flexible way of specifying both (composite) objective scores and (conditional) constraints by the user. Special emphasis was put on providing measures to aid practitioners with the validation process of the package.
{"title":"The adoptr Package: Adaptive Optimal Designs for Clinical Trials in R","authors":"K. Kunzmann, Maximilian Pilz, Carolin Herrmann, G. Rauch, M. Kieser","doi":"10.18637/jss.v098.i09","DOIUrl":"https://doi.org/10.18637/jss.v098.i09","url":null,"abstract":"Even though adaptive two-stage designs with unblinded interim analyses are becoming increasingly popular in clinical trial designs, there is a lack of statistical software to make their application more straightforward. The package adoptr fills this gap for the common case of two-stage one- or two-arm trials with (approximately) normally distributed outcomes. In contrast to previous approaches, adoptr optimizes the entire design upfront which allows maximal efficiency. To facilitate experimentation with different objective functions, adoptr supports a flexible way of specifying both (composite) objective scores and (conditional) constraints by the user. Special emphasis was put on providing measures to aid practitioners with the validation process of the package.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"9 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74829679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Krivitsky, David R. Hunter, M. Morris, Chad Klumb
The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the new functionality in the 2021 release of ergm version 4. These include more flexible handling of nodal covariates, term operators that extend and simplify model specification, new models for networks with valued edges, improved handling of constraints on the sample space of networks, and estimation with missing edge data. We also identify the new packages in the statnet suite that extend ergm's functionality to other network data types and structural features and the robust set of online resources that support the statnet development process and applications.
ergm包支持网络数据的统计分析和仿真。它锚定了用于网络分析的statnet套件,该套件在2008年统计软件杂志的特刊中介绍过。本文概述了2021年发布的ergm version 4中的新功能。其中包括更灵活地处理节点协变量、扩展和简化模型规范的术语算子、具有值边的网络新模型、改进的网络样本空间约束处理以及缺失边数据的估计。我们还确定了statnet套件中的新包,这些包将ergm的功能扩展到其他网络数据类型和结构特征,以及支持statnet开发过程和应用程序的强大在线资源集。
{"title":"ergm 4: New Features for Analyzing Exponential-Family Random Graph Models","authors":"P. Krivitsky, David R. Hunter, M. Morris, Chad Klumb","doi":"10.18637/jss.v105.i06","DOIUrl":"https://doi.org/10.18637/jss.v105.i06","url":null,"abstract":"The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the new functionality in the 2021 release of ergm version 4. These include more flexible handling of nodal covariates, term operators that extend and simplify model specification, new models for networks with valued edges, improved handling of constraints on the sample space of networks, and estimation with missing edge data. We also identify the new packages in the statnet suite that extend ergm's functionality to other network data types and structural features and the robust set of online resources that support the statnet development process and applications.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"50 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85807703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present mexhaz, an R package for fitting flexible hazard-based regression models with the possibility to add time-dependent effects of covariates and to account for a two level hierarchical structure in the data through the inclusion of a normally distributed random intercept (i.e., a log-normally distributed shared frailty). Moreover, mexhaz based models can be fitted within the excess hazard setting by allowing the specification of an expected hazard in the model. These models are of common use in the context of the analysis of population-based cancer registry data. Follow-up time can be entered in the right-censored or counting process input style, the latter allowing models with delayed entries. The logarithm of the baseline hazard can be flexibly modeled with B-splines or restricted cubic splines of time. Parameters estimation is based on likelihood maximization: in deriving the contribution of each observation to the cluster-specific conditional likelihood, Gauss-Legendre quadrature is used to calculate the cumulative hazard; the cluster-specific marginal likelihoods are then obtained by integrating over the random effects distribution, using adaptive Gauss-Hermite quadrature. Functions to compute and plot the predicted (excess) hazard and (net) survival (possibly with cluster-specific predictions in the case of random effect models) are provided. We illustrate the use of the different options of the mexhaz package and compare the results obtained with those of other available R packages.
{"title":"mexhaz: An R Package for Fitting Flexible Hazard-Based Regression Models for Overall and Excess Mortality with a Random Effect","authors":"H. Charvat, A. Belot","doi":"10.18637/jss.v098.i14","DOIUrl":"https://doi.org/10.18637/jss.v098.i14","url":null,"abstract":"We present mexhaz, an R package for fitting flexible hazard-based regression models with the possibility to add time-dependent effects of covariates and to account for a two level hierarchical structure in the data through the inclusion of a normally distributed random intercept (i.e., a log-normally distributed shared frailty). Moreover, mexhaz based models can be fitted within the excess hazard setting by allowing the specification of an expected hazard in the model. These models are of common use in the context of the analysis of population-based cancer registry data. Follow-up time can be entered in the right-censored or counting process input style, the latter allowing models with delayed entries. The logarithm of the baseline hazard can be flexibly modeled with B-splines or restricted cubic splines of time. Parameters estimation is based on likelihood maximization: in deriving the contribution of each observation to the cluster-specific conditional likelihood, Gauss-Legendre quadrature is used to calculate the cumulative hazard; the cluster-specific marginal likelihoods are then obtained by integrating over the random effects distribution, using adaptive Gauss-Hermite quadrature. Functions to compute and plot the predicted (excess) hazard and (net) survival (possibly with cluster-specific predictions in the case of random effect models) are provided. We illustrate the use of the different options of the mexhaz package and compare the results obtained with those of other available R packages.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"3 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82399466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Polina Suter, Jack Kuipers, G. Moffa, N. Beerenwinkel
The R package BiDAG implements Markov chain Monte Carlo (MCMC) methods for structure learning and sampling of Bayesian networks. The package includes tools to search for a maximum a posteriori (MAP) graph and to sample graphs from the posterior distribution given the data. A new hybrid approach to structure learning enables inference in large graphs. In the first step, we define a reduced search space by means of the PC algorithm or based on prior knowledge. In the second step, an iterative order MCMC scheme proceeds to optimize within the restricted search space and estimate the MAP graph. Sampling from the posterior distribution is implemented using either order or partition MCMC. The models and algorithms can handle both discrete and continuous data. The BiDAG package also provides an implementation of MCMC schemes for structure learning and sampling of dynamic Bayesian networks.
{"title":"Bayesian Structure Learning and Sampling of Bayesian Networks with the R Package BiDAG","authors":"Polina Suter, Jack Kuipers, G. Moffa, N. Beerenwinkel","doi":"10.18637/jss.v105.i09","DOIUrl":"https://doi.org/10.18637/jss.v105.i09","url":null,"abstract":"The R package BiDAG implements Markov chain Monte Carlo (MCMC) methods for structure learning and sampling of Bayesian networks. The package includes tools to search for a maximum a posteriori (MAP) graph and to sample graphs from the posterior distribution given the data. A new hybrid approach to structure learning enables inference in large graphs. In the first step, we define a reduced search space by means of the PC algorithm or based on prior knowledge. In the second step, an iterative order MCMC scheme proceeds to optimize within the restricted search space and estimate the MAP graph. Sampling from the posterior distribution is implemented using either order or partition MCMC. The models and algorithms can handle both discrete and continuous data. The BiDAG package also provides an implementation of MCMC schemes for structure learning and sampling of dynamic Bayesian networks.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"88 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84899879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhi Zhao, Marco Banterle, L. Bottolo, S. Richardson, A. Lewin, M. Zucknick
In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with highdimensional genomic and other omics data, a problem that can be studied with highdimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. Here, we also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.
{"title":"BayesSUR: An R Package for High-Dimensional Multivariate Bayesian Variable and Covariance Selection in Linear Regression","authors":"Zhi Zhao, Marco Banterle, L. Bottolo, S. Richardson, A. Lewin, M. Zucknick","doi":"10.18637/jss.v100.i11","DOIUrl":"https://doi.org/10.18637/jss.v100.i11","url":null,"abstract":"In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with highdimensional genomic and other omics data, a problem that can be studied with highdimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. Here, we also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"54 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86877519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Caimo, Lampros Bouranis, Robert W. Krause, N. Friel
Recent advances in computational methods for intractable models have made network data increasingly amenable to statistical analysis. Exponential random graph models (ERGMs) emerged as one of the main families of models capable of capturing the complex dependence structure of network data in a wide range of applied contexts. The Bergm package for R has become a popular package to carry out Bayesian parameter inference, missing data imputation, model selection and goodness-of-fit diagnostics for ERGMs. Over the last few years, the package has been considerably improved in terms of efficiency by adopting some of the state-of-the-art Bayesian computational methods for doubly-intractable distributions. Recently, version 5 of the package has been made available on CRAN having undergone a substantial makeover, which has made it more accessible and easy to use for practitioners. New functions include data augmentation procedures based on the approximate exchange algorithm for dealing with missing data, adjusted pseudo-likelihood and pseudo-posterior procedures, which allow for fast approximate inference of the ERGM parameter posterior and model evidence for networks on several thousands nodes.
棘手模型计算方法的最新进展使得网络数据越来越适合于统计分析。指数随机图模型(Exponential random graph model,简称ERGMs)是一类能够捕捉网络数据复杂依赖结构的主要模型,在广泛的应用环境中得到了广泛的应用。R语言的Bergm包已经成为一个流行的包,用于对ergm进行贝叶斯参数推断、缺失数据输入、模型选择和拟合优度诊断。在过去的几年中,通过采用一些最先进的贝叶斯计算方法来处理双难处理分布,软件包在效率方面有了很大的提高。最近,该软件包的第5版已经在CRAN上可用,它经历了实质性的改造,这使得从业者更容易访问和使用。新功能包括基于近似交换算法的数据增强程序,用于处理缺失数据,调整伪似然和伪后验程序,允许对数千个节点的网络进行ERGM参数后验和模型证据的快速近似推断。
{"title":"Statistical Network Analysis with Bergm","authors":"A. Caimo, Lampros Bouranis, Robert W. Krause, N. Friel","doi":"10.18637/jss.v104.i01","DOIUrl":"https://doi.org/10.18637/jss.v104.i01","url":null,"abstract":"Recent advances in computational methods for intractable models have made network data increasingly amenable to statistical analysis. Exponential random graph models (ERGMs) emerged as one of the main families of models capable of capturing the complex dependence structure of network data in a wide range of applied contexts. The Bergm package for R has become a popular package to carry out Bayesian parameter inference, missing data imputation, model selection and goodness-of-fit diagnostics for ERGMs. Over the last few years, the package has been considerably improved in terms of efficiency by adopting some of the state-of-the-art Bayesian computational methods for doubly-intractable distributions. Recently, version 5 of the package has been made available on CRAN having undergone a substantial makeover, which has made it more accessible and easy to use for practitioners. New functions include data augmentation procedures based on the approximate exchange algorithm for dealing with missing data, adjusted pseudo-likelihood and pseudo-posterior procedures, which allow for fast approximate inference of the ERGM parameter posterior and model evidence for networks on several thousands nodes.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"74 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78008212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}