首页 > 最新文献

Journal of Statistical Software最新文献

英文 中文
Bambi: A Simple Interface for Fitting Bayesian Linear Models in Python Bambi:一个用Python拟合贝叶斯线性模型的简单接口
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-12-19 DOI: 10.18637/jss.v103.i15
Tom'as Capretto, Camen Piho, Ravi Kumar, Jacob Westfall, T. Yarkoni, O. A. Martin
The popularity of Bayesian statistical methods has increased dramatically in recent years across many research areas and industrial applications. This is the result of a variety of methodological advances with faster and cheaper hardware as well as the development of new software tools. Here we introduce an open source Python package named Bambi (BAyesian Model Building Interface) that is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Bambi makes it easy to specify complex generalized linear hierarchical models using a formula notation similar to those found in R. We demonstrate Bambi's versatility and ease of use with a few examples spanning a range of common statistical models including multiple regression, logistic regression, and mixed-effects modeling with crossed group specific effects. Additionally we discuss how automatic priors are constructed. Finally, we conclude with a discussion of our plans for the future development of Bambi.
近年来,贝叶斯统计方法在许多研究领域和工业应用中得到了极大的普及。这是各种方法进步的结果,伴随着更快、更便宜的硬件以及新软件工具的开发。在这里,我们介绍一个名为Bambi(贝叶斯模型构建接口)的开源Python包,它构建在PyMC概率编程框架和ArviZ包的基础上,用于对贝叶斯模型进行探索性分析。Bambi使用类似于r中的公式符号可以很容易地指定复杂的广义线性层次模型。我们通过几个示例展示了Bambi的多功能性和易用性,这些示例涵盖了一系列常见的统计模型,包括多元回归、逻辑回归和具有跨组特定效果的混合效应建模。此外,我们还讨论了如何构造自动先验。最后,我们将讨论小鹿斑比未来的发展计划。
{"title":"Bambi: A Simple Interface for Fitting Bayesian Linear Models in Python","authors":"Tom'as Capretto, Camen Piho, Ravi Kumar, Jacob Westfall, T. Yarkoni, O. A. Martin","doi":"10.18637/jss.v103.i15","DOIUrl":"https://doi.org/10.18637/jss.v103.i15","url":null,"abstract":"The popularity of Bayesian statistical methods has increased dramatically in recent years across many research areas and industrial applications. This is the result of a variety of methodological advances with faster and cheaper hardware as well as the development of new software tools. Here we introduce an open source Python package named Bambi (BAyesian Model Building Interface) that is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Bambi makes it easy to specify complex generalized linear hierarchical models using a formula notation similar to those found in R. We demonstrate Bambi's versatility and ease of use with a few examples spanning a range of common statistical models including multiple regression, logistic regression, and mixed-effects modeling with crossed group specific effects. Additionally we discuss how automatic priors are constructed. Finally, we conclude with a discussion of our plans for the future development of Bambi.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"11 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73026084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Continuous Ordinal Regression for Analysis of Visual Analogue Scales: The R Package ordinalCont 视觉模拟尺度分析的连续序数回归:R包序数控制
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-12-05 DOI: 10.18637/jss.v096.i08
M. Manuguerra, G. Heller, Jun Ma
This paper introduces the R package ordinalCont, which implements an ordinal regression framework for response variables which are recorded on a visual analogue scale (VAS). This scale is used when recording subjects' perception of an intangible quantity such as pain, anxiety or quality of life, and consists of a mark made on a linear scale. We implement continuous ordinal regression models for VAS as the appropriate method of analysis for such responses, and introduce smoothing terms and random effects in the linear predictor. The model parameters are estimated using constrained optimization of the penalized likelihood and the penalty parameters are automatically selected via maximization of their marginal likelihood. The estimation algorithm is shown to perform well, in a simulation study. Two examples of application are given: the first involves the analysis of pain outcomes in a clinical trial for laser treatment for chronic neck pain; the second is an analysis of quality of life outcomes in a clinical trial for chemotherapy for the treatment of breast cancer.
本文介绍了R包ordinalCont,它实现了一个对记录在视觉模拟量表(VAS)上的响应变量进行有序回归的框架。该量表用于记录受试者对疼痛、焦虑或生活质量等无形量的感知,并由线性量表上的标记组成。我们为VAS实现连续有序回归模型,作为分析此类响应的适当方法,并在线性预测器中引入平滑项和随机效应。利用惩罚似然的约束优化来估计模型参数,并通过边际似然的最大化来自动选择惩罚参数。仿真研究表明,该估计算法具有良好的效果。给出了两个应用实例:第一个涉及分析慢性颈部疼痛的激光治疗临床试验的疼痛结果;第二个是对乳腺癌化疗临床试验的生活质量结果的分析。
{"title":"Continuous Ordinal Regression for Analysis of Visual Analogue Scales: The R Package ordinalCont","authors":"M. Manuguerra, G. Heller, Jun Ma","doi":"10.18637/jss.v096.i08","DOIUrl":"https://doi.org/10.18637/jss.v096.i08","url":null,"abstract":"This paper introduces the R package ordinalCont, which implements an ordinal regression framework for response variables which are recorded on a visual analogue scale (VAS). This scale is used when recording subjects' perception of an intangible quantity such as pain, anxiety or quality of life, and consists of a mark made on a linear scale. We implement continuous ordinal regression models for VAS as the appropriate method of analysis for such responses, and introduce smoothing terms and random effects in the linear predictor. The model parameters are estimated using constrained optimization of the penalized likelihood and the penalty parameters are automatically selected via maximization of their marginal likelihood. The estimation algorithm is shown to perform well, in a simulation study. Two examples of application are given: the first involves the analysis of pain outcomes in a clinical trial for laser treatment for chronic neck pain; the second is an analysis of quality of life outcomes in a clinical trial for chemotherapy for the treatment of breast cancer.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"21 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82063253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
fastnet: An R Package for Fast Simulation and Analysis of Large-Scale Social Networks fastnet:一个用于大规模社会网络快速模拟和分析的R包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-12-05 DOI: 10.2139/ssrn.3121725
Xu Dong, Luis E. Castro, N. I. Shaikh
Traditional tools and software for social network analysis are seldom scalable and/or fast. This paper provides an overview of an R package called fastnet, a tool for scaling and speeding up the simulation and analysis of large-scale social networks. fastnet uses multi-core processing and sub-graph sampling algorithms to achieve the desired scale-up and speed-up. Simple examples, usages, and comparisons of scale-up and speed-up as compared to other R packages, i.e., igraph and statnet, are presented.
用于社交网络分析的传统工具和软件很少具有可扩展性和/或快速性。本文提供了一个名为fastnet的R包的概述,fastnet是一个用于扩展和加速大规模社交网络模拟和分析的工具。Fastnet使用多核处理和子图采样算法来实现所需的扩展和加速。本文给出了一些简单的例子、用法以及与其他R包(即igraph和statnet)相比的scale-up和speed-up的比较。
{"title":"fastnet: An R Package for Fast Simulation and Analysis of Large-Scale Social Networks","authors":"Xu Dong, Luis E. Castro, N. I. Shaikh","doi":"10.2139/ssrn.3121725","DOIUrl":"https://doi.org/10.2139/ssrn.3121725","url":null,"abstract":"Traditional tools and software for social network analysis are seldom scalable and/or fast. This paper provides an overview of an R package called fastnet, a tool for scaling and speeding up the simulation and analysis of large-scale social networks. fastnet uses multi-core processing and sub-graph sampling algorithms to achieve the desired scale-up and speed-up. Simple examples, usages, and comparisons of scale-up and speed-up as compared to other R packages, i.e., igraph and statnet, are presented.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68563997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Generating Optimal Designs for Discrete Choice Experiments in R: The idefix Package R中离散选择实验的最优设计生成:标识包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-11-29 DOI: 10.18637/jss.v096.i03
Frits Traets, Danielle Sanchez, M. Vandebroek
Discrete choice experiments are widely used in a broad area of research fields to capture the preference structure of respondents. The design of such experiments will determine to a large extent the accuracy with which the preference parameters can be estimated. This paper presents a new R package, called idefix, which enables users to generate optimal designs for discrete choice experiments. Besides Bayesian D-efficient designs for the multinomial logit model, the package includes functions to generate Bayesian adaptive designs which can be used to gather data for the mixed logit model. In addition, the package provides the necessary tools to set up actual surveys and collect empirical data. After data collection, idefix can be used to transform the data into the necessary format in order to use existing estimation software in R.
离散选择实验被广泛应用于广泛的研究领域,以捕捉被调查者的偏好结构。这种实验的设计将在很大程度上决定偏好参数估计的准确性。本文提出了一个新的R包,称为idefix,它使用户能够生成离散选择实验的最佳设计。除了多项logit模型的贝叶斯D-efficient设计外,该软件包还包括生成贝叶斯自适应设计的函数,可用于收集混合logit模型的数据。此外,该包提供了必要的工具,以建立实际的调查和收集经验数据。数据收集完成后,可以使用idefix将数据转换成所需的格式,以便使用R中现有的估计软件。
{"title":"Generating Optimal Designs for Discrete Choice Experiments in R: The idefix Package","authors":"Frits Traets, Danielle Sanchez, M. Vandebroek","doi":"10.18637/jss.v096.i03","DOIUrl":"https://doi.org/10.18637/jss.v096.i03","url":null,"abstract":"Discrete choice experiments are widely used in a broad area of research fields to capture the preference structure of respondents. The design of such experiments will determine to a large extent the accuracy with which the preference parameters can be estimated. This paper presents a new R package, called idefix, which enables users to generate optimal designs for discrete choice experiments. Besides Bayesian D-efficient designs for the multinomial logit model, the package includes functions to generate Bayesian adaptive designs which can be used to gather data for the mixed logit model. In addition, the package provides the necessary tools to set up actual surveys and collect empirical data. After data collection, idefix can be used to transform the data into the necessary format in order to use existing estimation software in R.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84399722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Performing Parallel Monte Carlo and Moment Equations Methods for Itô and Stratonovich Stochastic Differential Systems: R Package Sim.DiffProc 执行平行蒙特卡罗和力矩方程方法Itô和Stratonovich随机微分系统:R包模拟。DiffProc
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-11-29 DOI: 10.18637/jss.v096.i02
A. Guidoum, Kamal Boukhetala
We introduce Sim.DiffProc, an R package for symbolic and numerical computations on scalar and multivariate systems of stochastic differential equations (SDEs). It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of these systems in both forms, Ito and Stratonovich. One of Sim.DiffProc key features is to implement the Monte Carlo method for the iterative evaluation and approximation of an interesting quantity at a fixed time on SDEs with parallel computing, on multiple processors on a single machine or a cluster of computers, which is an important tool to improve capacity and speed-up calculations. We also provide an easy-to-use interface for symbolic calculation and numerical approximation of the first and central second-order moments of SDEs (i.e., mean, variance and covariance), by solving a system of ordinary differential equations, which yields insights into the dynamics of stochastic systems. The final result object of Monte Carlo and moment equations can be derived and presented in terms of LATEX math expressions and visualized in terms of LATEX tables. Furthermore, we illustrate various features of the package by proposing a general bivariate nonlinear dynamic system of Haken-Zwanzig, driven by additive, linear and nonlinear multiplicative noises. In addition, we consider the particular case of a scalar SDE driven by three independent Wiener processes. The Monte Carlo simulation thereof is obtained through a transformation to a system of three equations. We also study some important applications of SDEs in different fields.
我们介绍Sim。DiffProc,一个R包符号和数值计算的标量和多元系统的随机微分方程(SDEs)。它为用户提供了广泛的工具来模拟、估计、分析和可视化这两种形式的系统动态,Ito和Stratonovich。Sim之一。DiffProc的主要特点是在并行计算的SDEs上,在单机或计算机集群上的多处理器上,实现蒙特卡罗方法在固定时间对感兴趣的量进行迭代评估和近似,这是提高容量和加速计算的重要工具。我们还提供了一个易于使用的界面,用于SDEs的一阶和中心二阶矩(即均值,方差和协方差)的符号计算和数值逼近,通过求解常微分方程系统,从而产生对随机系统动力学的见解。蒙特卡罗的最终结果对象和力矩方程可以用LATEX数学表达式来推导和表示,并可以用LATEX表来可视化。此外,我们通过提出一个由加性、线性和非线性乘性噪声驱动的一般二元非线性Haken-Zwanzig动态系统来说明该包的各种特征。此外,我们考虑了由三个独立的维纳过程驱动的标量SDE的特殊情况。通过对三方程系统的变换,得到了其蒙特卡罗模拟。我们还研究了SDEs在不同领域的一些重要应用。
{"title":"Performing Parallel Monte Carlo and Moment Equations Methods for Itô and Stratonovich Stochastic Differential Systems: R Package Sim.DiffProc","authors":"A. Guidoum, Kamal Boukhetala","doi":"10.18637/jss.v096.i02","DOIUrl":"https://doi.org/10.18637/jss.v096.i02","url":null,"abstract":"We introduce Sim.DiffProc, an R package for symbolic and numerical computations on scalar and multivariate systems of stochastic differential equations (SDEs). It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of these systems in both forms, Ito and Stratonovich. One of Sim.DiffProc key features is to implement the Monte Carlo method for the iterative evaluation and approximation of an interesting quantity at a fixed time on SDEs with parallel computing, on multiple processors on a single machine or a cluster of computers, which is an important tool to improve capacity and speed-up calculations. We also provide an easy-to-use interface for symbolic calculation and numerical approximation of the first and central second-order moments of SDEs (i.e., mean, variance and covariance), by solving a system of ordinary differential equations, which yields insights into the dynamics of stochastic systems. The final result object of Monte Carlo and moment equations can be derived and presented in terms of LATEX math expressions and visualized in terms of LATEX tables. Furthermore, we illustrate various features of the package by proposing a general bivariate nonlinear dynamic system of Haken-Zwanzig, driven by additive, linear and nonlinear multiplicative noises. In addition, we consider the particular case of a scalar SDE driven by three independent Wiener processes. The Monte Carlo simulation thereof is obtained through a transformation to a system of three equations. We also study some important applications of SDEs in different fields.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"123 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75809179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Gene-Based Methods to Detect Gene-Gene Interaction in R: The GeneGeneInteR Package 基于基因检测R基因-基因相互作用的方法:GeneGeneInteR包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-10-12 DOI: 10.18637/jss.v095.i12
M. Emily, Nicolas Sounac, F. Kroell, Magalie Houée-Bigot
GeneGeneInteR is an R package dedicated to the detection of an association between a case-control phenotype and the interaction between two sets of biallelic markers (single nucleotide polymorphisms or SNPs) in case-control genome-wide associations studies. The development of statistical procedures for searching gene-gene interaction at the SNP-set level has indeed recently grown in popularity as these methods confer advantage in both statistical power and biological interpretation. However, all these methods have been implemented in home made softwares that are for most of them available only on request to the authors and at best have a web interface. Since the implementation of these methods is not straightforward, there is a need for a user-friendly tool to perform gene-based genegene interaction. The purpose of GeneGeneInteR is to propose a collection of tools for all the steps involved in gene-based gene-gene interaction testing in case-control association studies. Illustrated by an example of a dataset related to rheumatoid arthritis, this paper details the implementation of the functions available in GeneGeneInteR to perform an analysis of a collection of SNP sets. Such an analysis aims at addressing the complete statistical pipeline going from data importation to the visualization of the results through data manipulation and statistical analysis.
GeneGeneInteR是一个R软件包,致力于在病例对照全基因组关联研究中检测病例对照表型与两组双等位基因标记(单核苷酸多态性或snp)之间的相互作用之间的关联。在snp集水平上搜索基因-基因相互作用的统计程序的发展最近确实越来越受欢迎,因为这些方法在统计能力和生物学解释方面都具有优势。然而,所有这些方法都是在国产软件中实现的,其中大多数只能根据作者的要求提供,充其量只有一个web界面。由于这些方法的实现并不简单,因此需要一种用户友好的工具来执行基于基因的基因相互作用。GeneGeneInteR的目的是为病例对照关联研究中基于基因的基因-基因相互作用测试的所有步骤提供一套工具。本文以类风湿关节炎相关数据集为例,详细介绍了GeneGeneInteR中可用功能的实现,以执行SNP集合的分析。这种分析旨在通过数据操作和统计分析解决从数据输入到结果可视化的完整统计管道。
{"title":"Gene-Based Methods to Detect Gene-Gene Interaction in R: The GeneGeneInteR Package","authors":"M. Emily, Nicolas Sounac, F. Kroell, Magalie Houée-Bigot","doi":"10.18637/jss.v095.i12","DOIUrl":"https://doi.org/10.18637/jss.v095.i12","url":null,"abstract":"GeneGeneInteR is an R package dedicated to the detection of an association between a case-control phenotype and the interaction between two sets of biallelic markers (single nucleotide polymorphisms or SNPs) in case-control genome-wide associations studies. The development of statistical procedures for searching gene-gene interaction at the SNP-set level has indeed recently grown in popularity as these methods confer advantage in both statistical power and biological interpretation. However, all these methods have been implemented in home made softwares that are for most of them available only on request to the authors and at best have a web interface. Since the implementation of these methods is not straightforward, there is a need for a user-friendly tool to perform gene-based genegene interaction. The purpose of GeneGeneInteR is to propose a collection of tools for all the steps involved in gene-based gene-gene interaction testing in case-control association studies. Illustrated by an example of a dataset related to rheumatoid arthritis, this paper details the implementation of the functions available in GeneGeneInteR to perform an analysis of a collection of SNP sets. Such an analysis aims at addressing the complete statistical pipeline going from data importation to the visualization of the results through data manipulation and statistical analysis.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"33 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77796451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Estimation of Random Utility Models in R: The mlogit Package 随机实用模型在R中的估计:mlogit包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-10-07 DOI: 10.18637/jss.v095.i11
Y. Croissant
mlogit is a package for R which enables the estimation of random utility models with choice situation and/or alternative specific variables. The main extensions of the basic multinomial model (heteroscedastic, nested and random parameter models) are implemented.
mlogit是一个用于R的软件包,它可以对具有选择情况和/或可选特定变量的随机实用新型进行估计。实现了基本多项式模型的主要扩展(异方差模型、嵌套模型和随机参数模型)。
{"title":"Estimation of Random Utility Models in R: The mlogit Package","authors":"Y. Croissant","doi":"10.18637/jss.v095.i11","DOIUrl":"https://doi.org/10.18637/jss.v095.i11","url":null,"abstract":"mlogit is a package for R which enables the estimation of random utility models with choice situation and/or alternative specific variables. The main extensions of the basic multinomial model (heteroscedastic, nested and random parameter models) are implemented.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"17 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90720025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
Pseudo-Ranks: How to Calculate Them Efficiently in R 伪秩:如何在R中有效地计算它们
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-10-07 DOI: 10.18637/jss.v095.c01
Martin Happ, G. Zimmermann, E. Brunner, A. Bathke
Many popular nonparametric inferential methods are based on ranks. Among the most commonly used and most famous tests are for example the Wilcoxon-Mann-Whitney test for two independent samples, and the Kruskal-Wallis test for multiple independent groups. However, recently, it has become clear that the use of ranks may lead to paradoxical results in case of more than two groups. Luckily, these problems can be avoided simply by using pseudo-ranks instead of ranks. These pseudo-ranks, however, suffer from being (a) at first less intuitive and not as straightforward in their interpretation, (b) computationally much more expensive to calculate. The computational cost has been prohibitive, for example, for large-scale simulative evaluations or application of resampling-based pseudorank procedures. In this paper, we provide different algorithms to calculate pseudo-ranks efficiently in order to solve problem (b) and thus render it possible to overcome the current limitations of procedures based on pseudo-ranks.
许多流行的非参数推理方法都是基于秩的。在最常用和最著名的测试中,例如针对两个独立样本的Wilcoxon-Mann-Whitney测试,以及针对多个独立群体的Kruskal-Wallis测试。然而,最近,很明显,在超过两组的情况下,使用排名可能会导致矛盾的结果。幸运的是,这些问题可以简单地通过使用伪秩而不是秩来避免。然而,这些伪排名的缺点是:(a)一开始不太直观,解释起来不那么直截了当,(b)计算成本要高得多。例如,对于大规模模拟评估或基于重新采样的伪秩程序的应用,计算成本令人望而却步。在本文中,我们提供了不同的算法来有效地计算伪秩,以解决问题(b),从而有可能克服目前基于伪秩的程序的局限性。
{"title":"Pseudo-Ranks: How to Calculate Them Efficiently in R","authors":"Martin Happ, G. Zimmermann, E. Brunner, A. Bathke","doi":"10.18637/jss.v095.c01","DOIUrl":"https://doi.org/10.18637/jss.v095.c01","url":null,"abstract":"Many popular nonparametric inferential methods are based on ranks. Among the most commonly used and most famous tests are for example the Wilcoxon-Mann-Whitney test for two independent samples, and the Kruskal-Wallis test for multiple independent groups. However, recently, it has become clear that the use of ranks may lead to paradoxical results in case of more than two groups. Luckily, these problems can be avoided simply by using pseudo-ranks instead of ranks. These pseudo-ranks, however, suffer from being (a) at first less intuitive and not as straightforward in their interpretation, (b) computationally much more expensive to calculate. The computational cost has been prohibitive, for example, for large-scale simulative evaluations or application of resampling-based pseudorank procedures. In this paper, we provide different algorithms to calculate pseudo-ranks efficiently in order to solve problem (b) and thus render it possible to overcome the current limitations of procedures based on pseudo-ranks.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"9 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84770972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
survHE: Survival Analysis for Health Economic Evaluation and Cost-Effectiveness Modeling 生存分析用于健康经济评价和成本-效果模型
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-10-07 DOI: 10.18637/jss.v095.i14
G. Baio
Survival analysis features heavily as an important part of health economic evaluation, an increasingly important component of medical research. In this setting, it is important to estimate the mean time to the survival endpoint using limited information (typically from randomized trials) and thus it is useful to consider parametric survival models. In this paper, we review the features of the R package survHE, specifically designed to wrap several tools to perform survival analysis for economic evaluation. In particular, survHE embeds both a standard, frequentist analysis (through the R package flexsurv) and a Bayesian approach, based on Hamiltonian Monte Carlo (via the R package rstan) or integrated nested Laplace approximation (with the R package INLA). Using this composite approach, we obtain maximum flexibility and are able to pre-compile a wide range of parametric models, with a view of simplifying the modelers' work and allowing them to move away from non-optimal work flows, including spreadsheets (e.g., Microsoft Excel).
生存分析作为健康经济评价的重要组成部分,在医学研究中占有越来越重要的地位。在这种情况下,使用有限的信息(通常来自随机试验)估计到生存终点的平均时间是很重要的,因此考虑参数化生存模型是有用的。在本文中,我们回顾了R包survHE的功能,专门设计用于包装几个工具来执行经济评估的生存分析。特别是,survHE嵌入了标准的频率分析(通过R包flexsurv)和基于哈密顿蒙特卡罗(通过R包rstan)或集成嵌套拉普拉斯近似(使用R包INLA)的贝叶斯方法。使用这种复合方法,我们获得了最大的灵活性,并且能够预编译广泛的参数化模型,以简化建模者的工作,并允许他们远离非最佳工作流程,包括电子表格(例如,Microsoft Excel)。
{"title":"survHE: Survival Analysis for Health Economic Evaluation and Cost-Effectiveness Modeling","authors":"G. Baio","doi":"10.18637/jss.v095.i14","DOIUrl":"https://doi.org/10.18637/jss.v095.i14","url":null,"abstract":"Survival analysis features heavily as an important part of health economic evaluation, an increasingly important component of medical research. In this setting, it is important to estimate the mean time to the survival endpoint using limited information (typically from randomized trials) and thus it is useful to consider parametric survival models. In this paper, we review the features of the R package survHE, specifically designed to wrap several tools to perform survival analysis for economic evaluation. In particular, survHE embeds both a standard, frequentist analysis (through the R package flexsurv) and a Bayesian approach, based on Hamiltonian Monte Carlo (via the R package rstan) or integrated nested Laplace approximation (with the R package INLA). Using this composite approach, we obtain maximum flexibility and are able to pre-compile a wide range of parametric models, with a view of simplifying the modelers' work and allowing them to move away from non-optimal work flows, including spreadsheets (e.g., Microsoft Excel).","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"48 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76911960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R 各种通用方差:一个面向对象的聚类协方差在R中的实现
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2020-10-07 DOI: 10.18637/JSS.V095.I01
A. Zeileis, Susanne Köll, N. Graham
Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, or other social sciences. They are employed to adjust the inference following estimation of a standard least-squares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to "the" clustered standard errors, there is a surprisingly wide variety of clustered covariances particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g. for zero-inflated, censored, or limited responses). In R, functions for covariances in clustered or panel models have been somewhat scattered or available only for certain modeling functions, notably the (generalized) linear regression model. In contrast, an object-oriented approach to "robust" covariance matrix estimation - applicable beyond lm() and glm() - is available in the sandwich package but has been limited to the case of cross-section or time series data. Now, this shortcoming has been corrected in sandwich (starting from version 2.4.0): Based on methods for two generic functions (estfun() and bread()), clustered and panel covariances are now provided in vcovCL(), vcovPL(), and vcovPC(). These are directly applicable to models from many packages, e.g., including MASS, pscl, countreg, betareg, among others. Some empirical illustrations are provided as well as an assessment of the methods' performance in a simulation study.
聚类协方差或聚类标准误差被广泛用于解释相关或聚类数据,特别是在经济学、政治学或其他社会科学中。它们用于调整标准最小二乘回归或由极大似然估计的广义线性模型估计后的推断。尽管许多出版物只是指“聚类标准误差”,但聚类协方差的种类之多令人惊讶,特别是由于不同类型的偏差校正。此外,虽然线性回归模型无疑是最重要的应用案例,但同样的策略可以用于更一般的模型(例如,对于零膨胀,审查或有限的响应)。在R中,聚类或面板模型中的协方差函数有些分散,或者仅适用于某些建模函数,特别是(广义)线性回归模型。相比之下,三明治包中有一种“健壮”协方差矩阵估计的面向对象方法(适用于lm()和glm()之外),但仅限于横截面或时间序列数据的情况。现在,这个缺点已经在sandwich中得到了纠正(从2.4.0版本开始):基于两个泛型函数(estfun()和bread())的方法,vcovCL()、vcovPL()和vcovPC()现在提供了聚类和面板协方差。这些直接适用于许多包中的模型,例如,包括MASS、pscl、country、betareg等。在模拟研究中,提供了一些实证说明以及对方法性能的评估。
{"title":"Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R","authors":"A. Zeileis, Susanne Köll, N. Graham","doi":"10.18637/JSS.V095.I01","DOIUrl":"https://doi.org/10.18637/JSS.V095.I01","url":null,"abstract":"Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, or other social sciences. They are employed to adjust the inference following estimation of a standard least-squares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to \"the\" clustered standard errors, there is a surprisingly wide variety of clustered covariances particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g. for zero-inflated, censored, or limited responses). In R, functions for covariances in clustered or panel models have been somewhat scattered or available only for certain modeling functions, notably the (generalized) linear regression model. In contrast, an object-oriented approach to \"robust\" covariance matrix estimation - applicable beyond lm() and glm() - is available in the sandwich package but has been limited to the case of cross-section or time series data. Now, this shortcoming has been corrected in sandwich (starting from version 2.4.0): Based on methods for two generic functions (estfun() and bread()), clustered and panel covariances are now provided in vcovCL(), vcovPL(), and vcovPC(). These are directly applicable to models from many packages, e.g., including MASS, pscl, countreg, betareg, among others. Some empirical illustrations are provided as well as an assessment of the methods' performance in a simulation study.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"41 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77953816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 290
期刊
Journal of Statistical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1