首页 > 最新文献

Journal of Statistical Software最新文献

英文 中文
deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression 深度回归:半结构化深度分布回归的灵活神经网络框架
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-04-06 DOI: 10.18637/jss.v105.i02
D. Rügamer, Ruolin Shen, Christina Bukas, Lisa Barros de Andrade e Sousa, Dominik Thalmeier, N. Klein, Chris Kolb, Florian Pfisterer, Philipp Kopper, B. Bischl, C. Müller
In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library pkg{TensorFlow} for the fusion of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks, as well as (3) pre-processing steps necessary to set up such models. The software package allows to define models in a user-friendly manner via a formula interface that is inspired by classical statistical model frameworks such as pkg{mgcv}. The packages' modular design and functionality provides a unique resource for both scalable estimation of complex statistical models and the combination of approaches from deep learning and statistics. This allows for state-of-the-art predictive performance while simultaneously retaining the indispensable interpretability of classical statistical models.
在本文中,我们描述了半结构化深度分布回归的实现,这是一种基于加性回归模型和深度网络相结合的学习条件分布的灵活框架。我们的实现包括(1)一个基于深度学习库pkg{TensorFlow}的模块化神经网络构建系统,用于融合各种统计和深度学习方法,(2)一个正交化单元,允许不同子网的可解释组合,以及(3)建立此类模型所需的预处理步骤。该软件包允许通过公式界面以用户友好的方式定义模型,该界面受经典统计模型框架(如pkg{mgcv})的启发。这些软件包的模块化设计和功能为复杂统计模型的可扩展估计以及深度学习和统计方法的组合提供了独特的资源。这允许最先进的预测性能,同时保留经典统计模型不可或缺的可解释性。
{"title":"deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression","authors":"D. Rügamer, Ruolin Shen, Christina Bukas, Lisa Barros de Andrade e Sousa, Dominik Thalmeier, N. Klein, Chris Kolb, Florian Pfisterer, Philipp Kopper, B. Bischl, C. Müller","doi":"10.18637/jss.v105.i02","DOIUrl":"https://doi.org/10.18637/jss.v105.i02","url":null,"abstract":"In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library pkg{TensorFlow} for the fusion of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks, as well as (3) pre-processing steps necessary to set up such models. The software package allows to define models in a user-friendly manner via a formula interface that is inspired by classical statistical model frameworks such as pkg{mgcv}. The packages' modular design and functionality provides a unique resource for both scalable estimation of complex statistical models and the combination of approaches from deep learning and statistics. This allows for state-of-the-art predictive performance while simultaneously retaining the indispensable interpretability of classical statistical models.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"327 3","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72435953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
mosum: A Package for Moving Sums in Change-Point Analysis mosum:一个在变化点分析中移动总数的包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-03-19 DOI: 10.18637/JSS.V097.I08
Alexander Meier, C. Kirch, Haeran Cho
Time series data, i.e., temporally ordered data, is routinely collected and analysed in in many fields of natural science, economy, technology and medicine, where it is of importance to verify the assumption of stochastic stationarity prior to modeling the data. Nonstationarities in the data are often attributed to structural changes with segments between adjacent change-points being approximately stationary. A particularly important, and thus widely studied, problem in statistics and signal processing is to detect changes in the mean at unknown time points. In this paper, we present the R package mosum, which implements elegant and mathematically well-justified procedures for the multiple mean change problem using the moving sum statistics.
时间序列数据,即时间有序数据,在自然科学、经济、技术和医学的许多领域都经常被收集和分析,在这些领域中,在对数据建模之前验证随机平稳性假设是很重要的。数据中的非平稳性通常归因于结构变化,相邻变化点之间的段近似平稳。在统计学和信号处理中,一个特别重要且被广泛研究的问题是在未知时间点检测平均值的变化。在本文中,我们提出了R包mosum,它使用移动和统计实现了优雅和数学上合理的多均值变化问题的过程。
{"title":"mosum: A Package for Moving Sums in Change-Point Analysis","authors":"Alexander Meier, C. Kirch, Haeran Cho","doi":"10.18637/JSS.V097.I08","DOIUrl":"https://doi.org/10.18637/JSS.V097.I08","url":null,"abstract":"Time series data, i.e., temporally ordered data, is routinely collected and analysed in in many fields of natural science, economy, technology and medicine, where it is of importance to verify the assumption of stochastic stationarity prior to modeling the data. Nonstationarities in the data are often attributed to structural changes with segments between adjacent change-points being approximately stationary. A particularly important, and thus widely studied, problem in statistics and signal processing is to detect changes in the mean at unknown time points. In this paper, we present the R package mosum, which implements elegant and mathematically well-justified procedures for the multiple mean change problem using the moving sum statistics.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"162 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88055458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
svars: An R Package for Data-Driven Identification in Multivariate Time Series Analysis svars:一个用于多变量时间序列分析中数据驱动识别的R包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-03-19 DOI: 10.18637/JSS.V097.I05
Alexander Lange, B. Dalheimer, H. Herwartz, Simone Maxand
Structural vector autoregressive (SVAR) models are frequently applied to trace the contemporaneous linkages among (macroeconomic) variables back to an interplay of orthogonal structural shocks. Under Gaussianity the structural parameters are unidentified without additional (often external and not data-based) information. In contrast, the often reasonable assumption of heteroskedastic and/or non-Gaussian model disturbances offers the possibility to identify unique structural shocks. We describe the R package svars which implements statistical identification techniques that can be both heteroskedasticity-based or independence-based. Moreover, it includes a rich variety of analysis tools that are well known in the SVAR literature. Next to a comprehensive review of the theoretical background, we provide a detailed description of the associated R functions. Furthermore, a macroeconomic application serves as a step-by-step guide on how to apply these functions to the identification and interpretation of structural VAR models.
结构向量自回归(SVAR)模型经常被用于追踪(宏观经济)变量之间的同期联系,以追溯到正交结构冲击的相互作用。在高斯性下,结构参数在没有附加(通常是外部的和非基于数据的)信息的情况下被识别。相反,通常对异方差和/或非高斯模型扰动的合理假设提供了识别独特结构冲击的可能性。我们描述了R包svars,它实现了统计识别技术,可以是基于异方差的,也可以是基于独立性的。此外,它还包括丰富多样的分析工具,这些工具在SVAR文献中是众所周知的。接下来是对理论背景的全面回顾,我们提供了相关R函数的详细描述。此外,宏观经济应用程序作为如何将这些函数应用于结构VAR模型的识别和解释的逐步指南。
{"title":"svars: An R Package for Data-Driven Identification in Multivariate Time Series Analysis","authors":"Alexander Lange, B. Dalheimer, H. Herwartz, Simone Maxand","doi":"10.18637/JSS.V097.I05","DOIUrl":"https://doi.org/10.18637/JSS.V097.I05","url":null,"abstract":"Structural vector autoregressive (SVAR) models are frequently applied to trace the contemporaneous linkages among (macroeconomic) variables back to an interplay of orthogonal structural shocks. Under Gaussianity the structural parameters are unidentified without additional (often external and not data-based) information. In contrast, the often reasonable assumption of heteroskedastic and/or non-Gaussian model disturbances offers the possibility to identify unique structural shocks. We describe the R package svars which implements statistical identification techniques that can be both heteroskedasticity-based or independence-based. Moreover, it includes a rich variety of analysis tools that are well known in the SVAR literature. Next to a comprehensive review of the theoretical background, we provide a detailed description of the associated R functions. Furthermore, a macroeconomic application serves as a step-by-step guide on how to apply these functions to the identification and interpretation of structural VAR models.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"27 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76133121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs. FamEvent:用于在家庭设计中生成时间到事件数据并对其进行建模的 R 软件包。
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-03-01 Epub Date: 2021-03-19 DOI: 10.18637/jss.v097.i07
Yun-Hee Choi, Laurent Briollais, Wenqing He, Karen Kopciuk

FamEvent is a comprehensive R package for simulating and modelling age-at-disease onset in families carrying a rare gene mutation. The package can simulate complex family data for variable time-to-event outcomes under three common family study designs (population, high-risk clinic and multi-stage) with various levels of missing genetic information among family members. Residual familial correlation can be induced through the inclusion of a frailty term or a second gene. Disease-gene carrier probabilities are evaluated assuming Mendelian transmission or empirically from the data. When genetic information on the disease gene is missing, an Expectation-Maximization algorithm is employed to calculate the carrier probabilities. Penetrance model functions with ascertainment correction adapted to the sampling design provide age-specific cumulative disease risks by sex, mutation status, and other covariates for simulated data as well as real data analysis. Robust standard errors and 95% confidence intervals are available for these estimates. Plots of pedigrees and penetrance functions based on the fitted model provide graphical displays to evaluate and summarize the models.

FamEvent 是一个综合性 R 软件包,用于模拟和建模携带罕见基因突变的家族的发病年龄。该软件包可以模拟复杂的家族数据,在三种常见的家族研究设计(人群、高风险诊所和多阶段)下,根据不同程度的家族成员遗传信息缺失情况,计算不同的时间到事件结果。可通过加入虚弱项或第二个基因来诱导残余家族相关性。疾病基因携带者概率是根据孟德尔传播假设或数据经验进行评估的。如果疾病基因的遗传信息缺失,则采用期望最大化算法计算携带者概率。根据抽样设计进行确定性校正的穿透性模型函数,为模拟数据和真实数据分析提供了按性别、突变状态和其他协变量划分的特定年龄累积疾病风险。这些估计值有稳健的标准误差和 95% 的置信区间。根据拟合模型绘制的系谱图和渗透函数图提供了评估和总结模型的图形显示。
{"title":"FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs.","authors":"Yun-Hee Choi, Laurent Briollais, Wenqing He, Karen Kopciuk","doi":"10.18637/jss.v097.i07","DOIUrl":"10.18637/jss.v097.i07","url":null,"abstract":"<p><p><b>FamEvent</b> is a comprehensive R package for simulating and modelling age-at-disease onset in families carrying a rare gene mutation. The package can simulate complex family data for variable time-to-event outcomes under three common family study designs (population, high-risk clinic and multi-stage) with various levels of missing genetic information among family members. Residual familial correlation can be induced through the inclusion of a frailty term or a second gene. Disease-gene carrier probabilities are evaluated assuming Mendelian transmission or empirically from the data. When genetic information on the disease gene is missing, an Expectation-Maximization algorithm is employed to calculate the carrier probabilities. Penetrance model functions with ascertainment correction adapted to the sampling design provide age-specific cumulative disease risks by sex, mutation status, and other covariates for simulated data as well as real data analysis. Robust standard errors and 95% confidence intervals are available for these estimates. Plots of pedigrees and penetrance functions based on the fitted model provide graphical displays to evaluate and summarize the models.</p>","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"97 7","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427460/pdf/nihms-1735562.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39408263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
intRinsic: An R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset 一个基于模型估计数据集内在维数的R包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-02-23 DOI: 10.18637/jss.v106.i09
Francesco Denti
This article illustrates intRinsic, an R package that implements novel state-of-the-art likelihood-based estimators of the intrinsic dimension of a dataset, an essential quantity for most dimensionality reduction techniques. In order to make these novel estimators easily accessible, the package contains a small number of high-level functions that rely on a broader set of efficient, low-level routines. Generally speaking, intRinsic encompasses models that fall into two categories: homogeneous and heterogeneous intrinsic dimension estimators. The first category contains the two nearest neighbors estimator, a method derived from the distributional properties of the ratios of the distances between each data point and its first two closest neighbors. The functions dedicated to this method carry out inference under both the frequentist and Bayesian frameworks. In the second category, we find the heterogeneous intrinsic dimension algorithm, a Bayesian mixture model for which an efficient Gibbs sampler is implemented. After presenting the theoretical background, we demonstrate the performance of the models on simulated datasets. This way, we can facilitate the exposition by immediately assessing the validity of the results. Then, we employ the package to study the intrinsic dimension of the Alon dataset, obtained from a famous microarray experiment. Finally, we show how the estimation of homogeneous and heterogeneous intrinsic dimensions allows us to gain valuable insights into the topological structure of a dataset.
本文演示了intRinsic,这是一个R包,它实现了对数据集的内在维度(对于大多数降维技术来说都是必不可少的量)的最新的基于似然的估计。为了使这些新颖的估计器易于访问,该包包含了少量依赖于更广泛的高效、低级例程集的高级函数。一般来说,intRinsic包含两类模型:同质和异质intRinsic维估计器。第一类包含两个最近邻估计器,这是一种从每个数据点与其前两个最近邻之间的距离之比的分布特性推导出来的方法。专用于该方法的函数在频率论和贝叶斯框架下进行推理。在第二类中,我们发现了异构本征维算法,这是一种贝叶斯混合模型,它实现了一个有效的吉布斯采样器。在介绍了理论背景之后,我们在模拟数据集上验证了模型的性能。这样,我们可以通过立即评估结果的有效性来促进阐述。然后,我们使用包来研究从一个著名的微阵列实验中获得的Alon数据集的固有维数。最后,我们展示了对同质和异质内在维度的估计如何使我们获得对数据集拓扑结构的有价值的见解。
{"title":"intRinsic: An R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset","authors":"Francesco Denti","doi":"10.18637/jss.v106.i09","DOIUrl":"https://doi.org/10.18637/jss.v106.i09","url":null,"abstract":"This article illustrates intRinsic, an R package that implements novel state-of-the-art likelihood-based estimators of the intrinsic dimension of a dataset, an essential quantity for most dimensionality reduction techniques. In order to make these novel estimators easily accessible, the package contains a small number of high-level functions that rely on a broader set of efficient, low-level routines. Generally speaking, intRinsic encompasses models that fall into two categories: homogeneous and heterogeneous intrinsic dimension estimators. The first category contains the two nearest neighbors estimator, a method derived from the distributional properties of the ratios of the distances between each data point and its first two closest neighbors. The functions dedicated to this method carry out inference under both the frequentist and Bayesian frameworks. In the second category, we find the heterogeneous intrinsic dimension algorithm, a Bayesian mixture model for which an efficient Gibbs sampler is implemented. After presenting the theoretical background, we demonstrate the performance of the models on simulated datasets. This way, we can facilitate the exposition by immediately assessing the validity of the results. Then, we employ the package to study the intrinsic dimension of the Alon dataset, obtained from a famous microarray experiment. Finally, we show how the estimation of homogeneous and heterogeneous intrinsic dimensions allows us to gain valuable insights into the topological structure of a dataset.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"14 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85981197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
sensobol: An R Package to Compute Variance-Based Sensitivity Indices sensobol:一个计算基于方差的灵敏度指数的R包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-01-22 DOI: 10.18637/jss.v102.i05
A. Puy, S. L. Piano, Andrea Saltelli, S. Levin
The R package"sensobol"provides several functions to conduct variance-based uncertainty and sensitivity analysis, from the estimation of sensitivity indices to the visual representation of the results. It implements several state-of-the-art first and total-order estimators and allows the computation of up to third-order effects, as well as of the approximation error, in a swift and user-friendly way. Its flexibility makes it also appropriate for models with either a scalar or a multivariate output. We illustrate its functionality by conducting a variance-based sensitivity analysis of three classic models: the Sobol' (1998) G function, the logistic population growth model of Verhulst (1845), and the spruce budworm and forest model of Ludwig, Jones and Holling (1976).
R软件包“sensobol”提供了几个函数来进行基于方差的不确定性和敏感性分析,从敏感性指标的估计到结果的可视化表示。它实现了几个最先进的一阶和全阶估计器,并允许以一种快速和用户友好的方式计算高达三阶的效应,以及近似误差。它的灵活性使得它也适用于具有标量输出或多变量输出的模型。我们通过对三个经典模型(Sobol' (1998) G函数、Verhulst(1845)的logistic种群增长模型以及Ludwig、Jones和Holling(1976)的云杉budworm和森林模型)进行基于方差的敏感性分析来说明其功能。
{"title":"sensobol: An R Package to Compute Variance-Based Sensitivity Indices","authors":"A. Puy, S. L. Piano, Andrea Saltelli, S. Levin","doi":"10.18637/jss.v102.i05","DOIUrl":"https://doi.org/10.18637/jss.v102.i05","url":null,"abstract":"The R package\"sensobol\"provides several functions to conduct variance-based uncertainty and sensitivity analysis, from the estimation of sensitivity indices to the visual representation of the results. It implements several state-of-the-art first and total-order estimators and allows the computation of up to third-order effects, as well as of the approximation error, in a swift and user-friendly way. Its flexibility makes it also appropriate for models with either a scalar or a multivariate output. We illustrate its functionality by conducting a variance-based sensitivity analysis of three classic models: the Sobol' (1998) G function, the logistic population growth model of Verhulst (1845), and the spruce budworm and forest model of Ludwig, Jones and Holling (1976).","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"15 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86930207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package 非参数机器学习和贝叶斯加性回归树的高效计算:BART R包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-01-14 DOI: 10.18637/JSS.V097.I01
R. Sparapani, Charles Spanbauer, R. McCulloch
In this article, we introduce the BART R package which is an acronym for Bayesian additive regression trees. BART is a Bayesian nonparametric, machine learning, ensemble predictive modeling method for continuous, binary, categorical and time-to-event outcomes. Furthermore, BART is a tree-based, black-box method which fits the outcome to an arbitrary random function, f , of the covariates. The BART technique is relatively computationally efficient as compared to its competitors, but large sample sizes can be demanding. Therefore, the BART package includes efficient state-of-the-art implementations for continuous, binary, categorical and time-to-event outcomes that can take advantage of modern off-the-shelf hardware and software multi-threading technology. The BART package is written in C++ for both programmer and execution efficiency. The BART package takes advantage of multi-threading via forking as provided by the parallel package and OpenMP when available and supported by the platform. The ensemble of binary trees produced by a BART fit can be stored and re-used later via the R predict function. In addition to being an R package, the installed BART routines can be called directly from C++. The BART package provides the tools for your BART toolbox.
在本文中,我们将介绍BART R包,它是贝叶斯加性回归树的缩写。BART是一种贝叶斯非参数、机器学习、集成预测建模方法,用于连续、二进制、分类和时间到事件的结果。此外,BART是一种基于树的黑箱方法,它将结果拟合到协变量的任意随机函数f中。与竞争对手相比,BART技术的计算效率相对较高,但是大样本量可能要求很高。因此,BART包包括高效的最先进的实现,可以利用现代现成的硬件和软件多线程技术,实现连续、二进制、分类和时间到事件的结果。BART包是用c++编写的,以提高编程效率和执行效率。BART包利用了并行包和OpenMP在平台可用和支持时提供的通过分叉的多线程。由BART拟合产生的二叉树集合可以通过R预测函数存储和重用。除了是R包之外,已安装的BART例程还可以直接从c++调用。BART包为您的BART工具箱提供了工具。
{"title":"Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package","authors":"R. Sparapani, Charles Spanbauer, R. McCulloch","doi":"10.18637/JSS.V097.I01","DOIUrl":"https://doi.org/10.18637/JSS.V097.I01","url":null,"abstract":"In this article, we introduce the BART R package which is an acronym for Bayesian additive regression trees. BART is a Bayesian nonparametric, machine learning, ensemble predictive modeling method for continuous, binary, categorical and time-to-event outcomes. Furthermore, BART is a tree-based, black-box method which fits the outcome to an arbitrary random function, f , of the covariates. The BART technique is relatively computationally efficient as compared to its competitors, but large sample sizes can be demanding. Therefore, the BART package includes efficient state-of-the-art implementations for continuous, binary, categorical and time-to-event outcomes that can take advantage of modern off-the-shelf hardware and software multi-threading technology. The BART package is written in C++ for both programmer and execution efficiency. The BART package takes advantage of multi-threading via forking as provided by the parallel package and OpenMP when available and supported by the platform. The ensemble of binary trees produced by a BART fit can be stored and re-used later via the R predict function. In addition to being an R package, the installed BART routines can be called directly from C++. The BART package provides the tools for your BART toolbox.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"115 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86293135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
The R Package forestinventory: Design-Based Global and Small Area Estimations for Multiphase Forest Inventories R包森林清查:基于设计的多阶段森林清查全局和小面积估算
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-01-14 DOI: 10.18637/JSS.V097.I04
Andreas Hill, Alexander Massey, D. Mandallaz
Forest inventories provide reliable evidence-based information to assess the state and development of forests over time. They typically consist of a random sample of plot locations in the forest that are assessed individually by field crews. Due to the high costs of these terrestrial campaigns, remote sensing information available in high quantity and low costs is frequently incorporated in the estimation process in order to reduce inventory costs or improve estimation precision. With respect to this objective, the application of multiphase forest inventory methods (e.g., double- and triple-sampling regression estimators) has proved to be efficient. While these methods have been successfully applied in practice, the availability of open-source software has been rare if not non-existent. The R package forestinventory provides a comprehensive set of global and small area regression estimators for multiphase forest inventories under simple and cluster sampling. The implemented methods have been demonstrated in various scientific studies ranging from small to large scale forest inventories, and can be used for post-stratification, regression and regression within strata. This article gives an extensive review of the mathematical theory of this family of design-based estimators, puts them into a common framework of forest inventory scenarios and demonstrates their application in the R environment.
森林清查为评估森林的长期状况和发展提供了可靠的循证信息。它们通常由森林中随机取样的小块地点组成,由实地工作人员单独评估。由于这些地面活动的成本很高,为了减少库存成本或提高估算精度,经常将数量多、成本低的遥感信息纳入估算过程。关于这一目标,采用多阶段森林清查方法(例如,双抽样和三抽样回归估计器)已证明是有效的。虽然这些方法在实践中得到了成功的应用,但开源软件的可用性即使不是不存在,也是很少的。R包森林清查为简单和聚类抽样下的多阶段森林清查提供了一套全面的全局和小区域回归估计。所执行的方法已在各种科学研究中得到证明,范围从小型到大型森林调查,并可用于分层后、回归和地层内回归。本文对这类基于设计的估算器的数学理论进行了广泛的回顾,将它们放入森林清查场景的通用框架中,并演示了它们在R环境中的应用。
{"title":"The R Package forestinventory: Design-Based Global and Small Area Estimations for Multiphase Forest Inventories","authors":"Andreas Hill, Alexander Massey, D. Mandallaz","doi":"10.18637/JSS.V097.I04","DOIUrl":"https://doi.org/10.18637/JSS.V097.I04","url":null,"abstract":"Forest inventories provide reliable evidence-based information to assess the state and development of forests over time. They typically consist of a random sample of plot locations in the forest that are assessed individually by field crews. Due to the high costs of these terrestrial campaigns, remote sensing information available in high quantity and low costs is frequently incorporated in the estimation process in order to reduce inventory costs or improve estimation precision. With respect to this objective, the application of multiphase forest inventory methods (e.g., double- and triple-sampling regression estimators) has proved to be efficient. While these methods have been successfully applied in practice, the availability of open-source software has been rare if not non-existent. The R package forestinventory provides a comprehensive set of global and small area regression estimators for multiphase forest inventories under simple and cluster sampling. The implemented methods have been demonstrated in various scientific studies ranging from small to large scale forest inventories, and can be used for post-stratification, regression and regression within strata. This article gives an extensive review of the mathematical theory of this family of design-based estimators, puts them into a common framework of forest inventory scenarios and demonstrates their application in the R environment.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"170 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76306607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
microsynth: Synthetic Control Methods for Disaggregated and Micro-Level Data in R microsynth: R中分解和微观级数据的综合控制方法
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-01-14 DOI: 10.18637/JSS.V097.I02
Michael W Robbins, Steven Davenport
The R package microsynth has been developed for implementation of the synthetic control methodology for comparative case studies involving micro- or meso-level data. The methodology implemented within microsynth is designed to assess the efficacy of a treatment or intervention within a well-defined geographic region that is itself a composite of several smaller regions (where data are available at the more granular level for comparison regions as well). The effect of the intervention on one or more time-varying outcomes is evaluated by determining a synthetic control region that resembles the treatment region across pre-intervention values of the outcome(s) and time-invariant covariates and that is a weighted composite of many untreated comparison regions. The microsynth procedure includes functionality that enables its user to (1) calculate weights for synthetic control, (2) tabulate results for statistical inferences, and (3) create time series plots of outcomes for treatment and synthetic control. In this article, microsynth is described in detail and its application is illustrated using data from a drug market intervention in Seattle, WA.
R包microsynth已开发用于实施涉及微观或中观水平数据的比较案例研究的综合控制方法。在microsynth中实现的方法旨在评估在定义明确的地理区域内的治疗或干预措施的效果,该地理区域本身是几个较小区域的组合(其中的数据可以在更细粒度的级别上用于比较区域)。干预对一个或多个时变结果的影响是通过确定一个合成控制区来评估的,该控制区与干预前结果值和时不变协变量的治疗区相似,该控制区是许多未经处理的比较区域的加权组合。microsynth程序包括使用户能够(1)计算合成控制的权重,(2)将统计推断结果制表,以及(3)创建治疗和合成控制结果的时间序列图的功能。在本文中,详细描述了microsynth,并使用来自华盛顿州西雅图药品市场干预的数据说明了其应用。
{"title":"microsynth: Synthetic Control Methods for Disaggregated and Micro-Level Data in R","authors":"Michael W Robbins, Steven Davenport","doi":"10.18637/JSS.V097.I02","DOIUrl":"https://doi.org/10.18637/JSS.V097.I02","url":null,"abstract":"The R package microsynth has been developed for implementation of the synthetic control methodology for comparative case studies involving micro- or meso-level data. The methodology implemented within microsynth is designed to assess the efficacy of a treatment or intervention within a well-defined geographic region that is itself a composite of several smaller regions (where data are available at the more granular level for comparison regions as well). The effect of the intervention on one or more time-varying outcomes is evaluated by determining a synthetic control region that resembles the treatment region across pre-intervention values of the outcome(s) and time-invariant covariates and that is a weighted composite of many untreated comparison regions. The microsynth procedure includes functionality that enables its user to (1) calculate weights for synthetic control, (2) tabulate results for statistical inferences, and (3) create time series plots of outcomes for treatment and synthetic control. In this article, microsynth is described in detail and its application is illustrated using data from a drug market intervention in Seattle, WA.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"10 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79200084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Simulating Survival Data Using the simsurv R Package 使用simsurv R包模拟生存数据
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-01-14 DOI: 10.18637/JSS.V097.I03
S. Brilleman, R. Wolfe, M. Moreno-Betancur, M. Crowther
The simsurv R package allows users to simulate survival (i.e., time-to-event) data from standard parametric distributions (exponential, Weibull, and Gompertz), two-component mixture distributions, or a user-defined hazard function. Baseline covariates can be included under a proportional hazards assumption. Clustered event times, for example individuals within a family, are also easily accommodated. Time-dependent effects (i.e., nonproportional hazards) can be included by interacting covariates with linear time or a user-defined function of time. Under a user-defined hazard function, event times can be generated for a variety of complex models such as flexible (spline-based) baseline hazards, models with time-varying covariates, or joint longitudinal-survival models.
simsurv R包允许用户从标准参数分布(指数分布、Weibull分布和Gompertz分布)、双组分混合分布或用户定义的风险函数中模拟生存(即事件发生时间)数据。基线协变量可以包含在比例风险假设下。群集事件时间,例如家庭中的个体,也很容易适应。时间相关效应(即非比例风险)可以通过与线性时间或用户定义的时间函数相互作用的协变量来包含。在用户定义的危险函数下,可以为各种复杂模型生成事件时间,例如灵活的(基于样条的)基线危险、具有时变协变量的模型或联合纵向生存模型。
{"title":"Simulating Survival Data Using the simsurv R Package","authors":"S. Brilleman, R. Wolfe, M. Moreno-Betancur, M. Crowther","doi":"10.18637/JSS.V097.I03","DOIUrl":"https://doi.org/10.18637/JSS.V097.I03","url":null,"abstract":"The simsurv R package allows users to simulate survival (i.e., time-to-event) data from standard parametric distributions (exponential, Weibull, and Gompertz), two-component mixture distributions, or a user-defined hazard function. Baseline covariates can be included under a proportional hazards assumption. Clustered event times, for example individuals within a family, are also easily accommodated. Time-dependent effects (i.e., nonproportional hazards) can be included by interacting covariates with linear time or a user-defined function of time. Under a user-defined hazard function, event times can be generated for a variety of complex models such as flexible (spline-based) baseline hazards, models with time-varying covariates, or joint longitudinal-survival models.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76268703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
期刊
Journal of Statistical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1