R Journal最新文献_第7页

SemiCompRisks: An R Package for the Analysis of Independent and Cluster-correlated Semi-competing Risks Data. 一个半竞争风险:一个独立的和聚类相关的半竞争风险数据分析的R包。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2019-06-01 Epub Date: 2019-08-20 DOI: 10.32614/rj-2019-038

Danilo Alvares, Sebastien Haneuse, Catherine Lee, Kyu Ha Lee

Semi-competing risks refer to the setting where primary scientific interest lies in estimation and inference with respect to a non-terminal event, the occurrence of which is subject to a terminal event. In this paper, we present the R package SemiCompRisks that provides functions to perform the analysis of independent/clustered semi-competing risks data under the illness-death multi-state model. The package allows the user to choose the specification for model components from a range of options giving users substantial flexibility, including: accelerated failure time or proportional hazards regression models; parametric or non-parametric specifications for baseline survival functions; parametric or non-parametric specifications for random effects distributions when the data are cluster-correlated; and, a Markov or semi-Markov specification for terminal event following non-terminal event. While estimation is mainly performed within the Bayesian paradigm, the package also provides the maximum likelihood estimation for select parametric models. The package also includes functions for univariate survival analysis as complementary analysis tools.

半竞争风险是指主要的科学兴趣在于对非终结事件的估计和推断，而非终结事件的发生取决于终结事件。在本文中，我们提出了一个R包semiomprisks，它提供了在疾病-死亡多状态模型下进行独立/聚类半竞争风险数据分析的功能。该软件包允许用户从一系列选项中选择模型组件的规格，为用户提供了很大的灵活性，包括:加速故障时间或比例风险回归模型;基线生存函数的参数或非参数说明;当数据簇相关时，随机效应分布的参数或非参数规范;非终端事件之后的终端事件的马尔可夫或半马尔可夫规范。虽然估计主要是在贝叶斯范式中进行的，但该软件包还为选择的参数模型提供了最大似然估计。该软件包还包括单变量生存分析功能，作为补充分析工具。

{"title":"SemiCompRisks: An R Package for the Analysis of Independent and Cluster-correlated Semi-competing Risks Data.","authors":"Danilo Alvares, Sebastien Haneuse, Catherine Lee, Kyu Ha Lee","doi":"10.32614/rj-2019-038","DOIUrl":"https://doi.org/10.32614/rj-2019-038","url":null,"abstract":"Semi-competing risks refer to the setting where primary scientific interest lies in estimation and inference with respect to a non-terminal event, the occurrence of which is subject to a terminal event. In this paper, we present the R package SemiCompRisks that provides functions to perform the analysis of independent/clustered semi-competing risks data under the illness-death multi-state model. The package allows the user to choose the specification for model components from a range of options giving users substantial flexibility, including: accelerated failure time or proportional hazards regression models; parametric or non-parametric specifications for baseline survival functions; parametric or non-parametric specifications for random effects distributions when the data are cluster-correlated; and, a Markov or semi-Markov specification for terminal event following non-terminal event. While estimation is mainly performed within the Bayesian paradigm, the package also provides the maximum likelihood estimation for select parametric models. The package also includes functions for univariate survival analysis as complementary analysis tools.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"11 1","pages":"376-400"},"PeriodicalIF":2.1,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7889044/pdf/nihms-1668679.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25382986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

What's for dynr: A Package for Linear and Nonlinear Dynamic Modeling in R. 什么是dynr:一个在R中的线性和非线性动态建模包。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2019-06-01 DOI: 10.32614/rj-2019-012

Lu Ou, Michael D Hunter, Sy-Miin Chow

Intensive longitudinal data in the behavioral sciences are often noisy, multivariate in nature, and may involve multiple units undergoing regime switches by showing discontinuities interspersed with continuous dynamics. Despite increasing interest in using linear and nonlinear differential/difference equation models with regime switches, there has been a scarcity of software packages that are fast and freely accessible. We have created an R package called dynr that can handle a broad class of linear and nonlinear discrete- and continuous-time models, with regime-switching properties and linear Gaussian measurement functions, in C, while maintaining simple and easy-to-learn model specification functions in R. We present the mathematical and computational bases used by the dynr R package, and present two illustrative examples to demonstrate the unique features of dynr.

行为科学中密集的纵向数据通常是嘈杂的、多变量的，并且可能涉及多个单元，通过显示穿插在连续动态中的不连续性来进行状态切换。尽管人们对使用具有状态切换的线性和非线性微分/差分方程模型越来越感兴趣，但缺乏快速且可自由访问的软件包。我们创建了一个名为dynr的R包，它可以用C处理一系列线性和非线性离散和连续时间模型，具有状态切换特性和线性高斯测量函数，同时在R中保持简单易学的模型规范函数。我们介绍了dynr R包使用的数学和计算基础，并给出了两个示例来说明dynr的独特特性。

引用次数: 38

rFSA: An R Package for Finding Best Subsets and Interactions. rFSA:一个寻找最佳子集和交互的R包。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-12-01 Epub Date: 2018-12-08 DOI: 10.32614/rj-2018-059

Joshua Lambert, Liyu Gong, Corrine F Elliott, Katherine Thompson, Arnold Stromberg

Herein we present the R package rFSA, which implements an algorithm for improved variable selection. The algorithm searches a data space for models of a user-specified form that are statistically optimal under a measure of model quality. Many iterations afford a set of feasible solutions (or candidate models) that the researcher can evaluate for relevance to his or her questions of interest. The algorithm can be used to formulate new or to improve upon existing models in bioinformatics, health care, and myriad other fields in which the volume of available data has outstripped researchers' practical and computational ability to explore larger subsets or higher-order interaction terms. The package accommodates linear and generalized linear models, as well as a variety of criterion functions such as Allen's PRESS and AIC. New modeling strategies and criterion functions can be adapted easily to work with rFSA.

本文提出了R包rFSA，它实现了一种改进的变量选择算法。该算法在数据空间中搜索在模型质量度量下统计上最优的用户指定形式的模型。许多迭代提供了一组可行的解决方案(或候选模型)，研究人员可以评估与他或她感兴趣的问题的相关性。该算法可用于在生物信息学、医疗保健和无数其他领域制定新的或改进现有模型，这些领域的可用数据量已经超过了研究人员探索更大子集或高阶交互项的实际和计算能力。该软件包可容纳线性和广义线性模型，以及各种标准函数，如Allen's PRESS和AIC。新的建模策略和标准函数可以很容易地适应rFSA。

引用次数: 24

Semiparametric Generalized Linear Models with the gldrm Package. 带有gldrm软件包的半参数广义线性模型。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-07-01

Michael J Wurm, Paul J Rathouz

This paper introduces a new algorithm to estimate and perform inferences on a recently proposed and developed semiparametric generalized linear model (glm). Rather than selecting a particular parametric exponential family model, such as the Poisson distribution, this semiparametric glm assumes that the response is drawn from the more general exponential tilt family. The regression coefficients and unspecified reference distribution are estimated by maximizing a semiparametric likelihood. The new algorithm incorporates several computational stability and efficiency improvements over the algorithm originally proposed. In particular, the new algorithm performs well for either small or large support for the nonparametric response distribution. The algorithm is implemented in a new R package called gldrm.

本文介绍了一种新的算法来估计和推断最近提出和发展的半参数广义线性模型（glm）。这种半参数glm不是选择特定的参数指数族模型，例如泊松分布，而是假设响应来自更一般的指数倾斜族。回归系数和未指定的参考分布是通过最大化半参数似然来估计的。与最初提出的算法相比，新算法包含了一些计算稳定性和效率的改进。特别地，新算法对于非参数响应分布的小支持或大支持都表现良好。该算法在一个名为gldrm的新R包中实现。

引用次数: 0

Semiparametric Generalized Linear Models with the gldrm Package 具有gldrm包的半参数广义线性模型

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-07-01 DOI: 10.32614/RJ-2018-027

Mike Wurm, P. Rathouz

This paper introduces a new algorithm to estimate and perform inferences on a recently proposed and developed semiparametric generalized linear model (glm). Rather than selecting a particular parametric exponential family model, such as the Poisson distribution, this semiparametric glm assumes that the response is drawn from the more general exponential tilt family. The regression coefficients and unspecified reference distribution are estimated by maximizing a semiparametric likelihood. The new algorithm incorporates several computational stability and efficiency improvements over the algorithm originally proposed. In particular, the new algorithm performs well for either small or large support for the nonparametric response distribution. The algorithm is implemented in a new R package called gldrm.

本文介绍了一种新的估计和推理半参数广义线性模型(glm)的算法。而不是选择一个特定的参数指数族模型，如泊松分布，这种半参数glm假设响应是从更一般的指数倾斜族中提取的。通过最大化半参数似然估计回归系数和未指定参考分布。新算法在原有算法的基础上提高了计算稳定性和效率。特别是对于非参数响应分布的小支持和大支持，新算法都表现良好。该算法在一个名为gldrm的新R包中实现。

引用次数: 3

MGLM: An R Package for Multivariate Categorical Data Analysis. 多变量分类数据分析的R包。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-07-01 DOI: 10.32614/rj-2018-015

Juhyun Kim, Yiwen Zhang, Joshua Day, Hua Zhou

Data with multiple responses is ubiquitous in modern applications. However, few tools are available for regression analysis of multivariate counts. The most popular multinomial-logit model has a very restrictive mean-variance structure, limiting its applicability to many data sets. This article introduces an R package MGLM, short for multivariate response generalized linear models, that expands the current tools for regression analysis of polytomous data. Distribution fitting, random number generation, regression, and sparse regression are treated in a unifying framework. The algorithm, usage, and implementation details are discussed.

具有多重响应的数据在现代应用中无处不在。然而，很少有工具可用于多元计数的回归分析。最流行的多项式-logit模型具有非常严格的均值-方差结构，限制了它对许多数据集的适用性。本文介绍了一个R包MGLM，即多元响应广义线性模型(multivariate response generalized linear models)的缩写，它扩展了当前用于多元数据回归分析的工具。分布拟合、随机数生成、回归和稀疏回归在一个统一的框架中处理。讨论了算法、用法和实现细节。

引用次数: 14

A System for an Accountable Data Analysis Process in R. 一个负责任的数据分析过程系统。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-07-01 Epub Date: 2018-05-15

Jonathan Gelfond, Martin Goros, Brian Hernandez, Alex Bokov

Efficiently producing transparent analyses may be difficult for beginners or tedious for the experienced. This implies a need for computing systems and environments that can efficiently satisfy reproducibility and accountability standards. To this end, we have developed a system, R package, and R Shiny application called adapr (Accountable Data Analysis Process in R) that is built on the principle of accountable units. An accountable unit is a data file (statistic, table or graphic) that can be associated with a provenance, meaning how it was created, when it was created and who created it, and this is similar to the 'verifiable computational results' (VCR) concept proposed by Gavish and Donoho. Both accountable units and VCRs are version controlled, sharable, and can be incorporated into a collaborative project. However, accountable units use file hashes and do not involve watermarking or public repositories like VCRs. Reproducing collaborative work may be highly complex, requiring repeating computations on multiple systems from multiple authors; however, determining the provenance of each unit is simpler, requiring only a search using file hashes and version control systems.

有效地生成透明的分析对于初学者来说可能是困难的，对于有经验的人来说则是乏味的。这意味着需要能够有效地满足再现性和责任标准的计算系统和环境。为此，我们开发了一个系统，R包和R Shiny应用程序，称为adapr (R中的可问责数据分析过程)，它建立在可问责单元的原则之上。可问责单位是一个数据文件(统计数据、表格或图形)，可以与出处相关联，这意味着它是如何创建的，何时创建的以及谁创建的，这类似于Gavish和Donoho提出的“可验证计算结果”(VCR)概念。责任制单元和vcr都是版本控制的、可共享的，并且可以合并到协作项目中。然而，责任单元使用文件哈希，不涉及水印或vcr等公共存储库。再现协作工作可能非常复杂，需要在多个作者的多个系统上重复计算;然而，确定每个单元的来源更简单，只需要使用文件散列和版本控制系统进行搜索。

{"title":"A System for an Accountable Data Analysis Process in R.","authors":"Jonathan Gelfond, Martin Goros, Brian Hernandez, Alex Bokov","doi":"","DOIUrl":"","url":null,"abstract":"Efficiently producing transparent analyses may be difficult for beginners or tedious for the experienced. This implies a need for computing systems and environments that can efficiently satisfy reproducibility and accountability standards. To this end, we have developed a system, R package, and R Shiny application called adapr (Accountable Data Analysis Process in R) that is built on the principle of accountable units. An accountable unit is a data file (statistic, table or graphic) that can be associated with a provenance, meaning how it was created, when it was created and who created it, and this is similar to the 'verifiable computational results' (VCR) concept proposed by Gavish and Donoho. Both accountable units and VCRs are version controlled, sharable, and can be incorporated into a collaborative project. However, accountable units use file hashes and do not involve watermarking or public repositories like VCRs. Reproducing collaborative work may be highly complex, requiring repeating computations on multiple systems from multiple authors; however, determining the provenance of each unit is simpler, requiring only a search using file hashes and version control systems.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"10 1","pages":"6-21"},"PeriodicalIF":2.1,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261481/pdf/nihms962940.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36787790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A System for an Accountable Data Analysis Process in R R中负责数据分析过程的系统

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-05-15 DOI: 10.32614/RJ-2018-001

J. Gelfond, M. Goros, B. Hernandez, A. Bokov

Efficiently producing transparent analyses may be difficult for beginners or tedious for the experienced. This implies a need for computing systems and environments that can efficiently satisfy reproducibility and accountability standards. To this end, we have developed a system, R package, and R Shiny application called adapr (Accountable Data Analysis Process in R) that is built on the principle of accountable units. An accountable unit is a data file (statistic, table or graphic) that can be associated with a provenance, meaning how it was created, when it was created and who created it, and this is similar to the 'verifiable computational results' (VCR) concept proposed by Gavish and Donoho. Both accountable units and VCRs are version controlled, sharable, and can be incorporated into a collaborative project. However, accountable units use file hashes and do not involve watermarking or public repositories like VCRs. Reproducing collaborative work may be highly complex, requiring repeating computations on multiple systems from multiple authors; however, determining the provenance of each unit is simpler, requiring only a search using file hashes and version control systems.

高效地进行透明分析对初学者来说可能很困难，对有经验的人来说可能很乏味。这意味着需要能够有效地满足再现性和责任标准的计算系统和环境。为此，我们开发了一个基于责任单位原则的系统、R包和R Shiny应用程序，称为adapr（R中的责任数据分析过程）。责任单位是一种数据文件（统计数据、表格或图形），可以与出处相关联，也就是说它是如何创建的，何时创建以及由谁创建的，这类似于Gavish和Donoho提出的“可验证计算结果”（VCR）概念。责任单位和风险控制报告都是版本控制的、可共享的，并且可以合并到一个协作项目中。然而，责任单位使用文件哈希，不涉及水印或VCR等公共存储库。复制协作工作可能非常复杂，需要多个作者在多个系统上重复计算；然而，确定每个单元的来源更简单，只需要使用文件哈希和版本控制系统进行搜索。

{"title":"A System for an Accountable Data Analysis Process in R","authors":"J. Gelfond, M. Goros, B. Hernandez, A. Bokov","doi":"10.32614/RJ-2018-001","DOIUrl":"https://doi.org/10.32614/RJ-2018-001","url":null,"abstract":"Efficiently producing transparent analyses may be difficult for beginners or tedious for the experienced. This implies a need for computing systems and environments that can efficiently satisfy reproducibility and accountability standards. To this end, we have developed a system, R package, and R Shiny application called adapr (Accountable Data Analysis Process in R) that is built on the principle of accountable units. An accountable unit is a data file (statistic, table or graphic) that can be associated with a provenance, meaning how it was created, when it was created and who created it, and this is similar to the 'verifiable computational results' (VCR) concept proposed by Gavish and Donoho. Both accountable units and VCRs are version controlled, sharable, and can be incorporated into a collaborative project. However, accountable units use file hashes and do not involve watermarking or public repositories like VCRs. Reproducing collaborative work may be highly complex, requiring repeating computations on multiple systems from multiple authors; however, determining the provenance of each unit is simpler, requiring only a search using file hashes and version control systems.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"10 1 1","pages":"6-21"},"PeriodicalIF":2.1,"publicationDate":"2018-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49470970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

R Package imputeTestbench to Compare Imputation Methods for Univariate Time Series. R包imputeTestbench来比较单变量时间序列的Imputation方法。

IF 2.3 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-01-01

Marcus W Beck, Neeraj Bokde, Gualberto Asencio-Cortés, Kishore Kulat

Missing observations are common in time series data and several methods are available to impute these values prior to analysis. Variation in statistical characteristics of univariate time series can have a profound effect on characteristics of missing observations and, therefore, the accuracy of different imputation methods. The imputeTestbench package can be used to compare the prediction accuracy of different methods as related to the amount and type of missing data for a user-supplied dataset. Missing data are simulated by removing observations completely at random or in blocks of different sizes depending on characteristics of the data. Several imputation algorithms are included with the package that vary from simple replacement with means to more complex interpolation methods. The testbench is not limited to the default functions and users can add or remove methods as needed. Plotting functions also allow comparative visualization of the behavior and effectiveness of different algorithms. We present example applications that demonstrate how the package can be used to understand differences in prediction accuracy between methods as affected by characteristics of a dataset and the nature of missing data.

缺失观测值在时间序列数据中很常见，有几种方法可用于在分析之前推断这些值。单变量时间序列统计特征的变化会对缺失观测值的特征产生深远的影响，从而影响不同估算方法的准确性。对于用户提供的数据集，可以使用imputeTestbench包来比较与缺失数据的数量和类型相关的不同方法的预测精度。通过完全随机地或根据数据的特征以不同大小的块移除观测值来模拟缺失的数据。几种插值算法包含在包中，从简单的替换手段到更复杂的插值方法。测试平台不局限于默认函数，用户可以根据需要添加或删除方法。绘图函数还允许对不同算法的行为和有效性进行比较可视化。我们给出的示例应用程序演示了如何使用该包来理解受数据集特征和缺失数据性质影响的方法之间的预测准确性差异。

{"title":"R Package imputeTestbench to Compare Imputation Methods for Univariate Time Series.","authors":"Marcus W Beck, Neeraj Bokde, Gualberto Asencio-Cortés, Kishore Kulat","doi":"","DOIUrl":"","url":null,"abstract":"Missing observations are common in time series data and several methods are available to impute these values prior to analysis. Variation in statistical characteristics of univariate time series can have a profound effect on characteristics of missing observations and, therefore, the accuracy of different imputation methods. The imputeTestbench package can be used to compare the prediction accuracy of different methods as related to the amount and type of missing data for a user-supplied dataset. Missing data are simulated by removing observations completely at random or in blocks of different sizes depending on characteristics of the data. Several imputation algorithms are included with the package that vary from simple replacement with means to more complex interpolation methods. The testbench is not limited to the default functions and users can add or remove methods as needed. Plotting functions also allow comparative visualization of the behavior and effectiveness of different algorithms. We present example applications that demonstrate how the package can be used to understand differences in prediction accuracy between methods as affected by characteristics of a dataset and the nature of missing data.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"10 1","pages":"218-233"},"PeriodicalIF":2.3,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309171/pdf/nihms-1507947.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36822605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PanJen: An R package for Ranking Transformations in a Linear Regression PanJen:一个用于线性回归中排序变换的R包

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

R Journal

Pub Date : 2018-01-01 DOI: 10.32614/RJ-2018-018

C. U. Jensen, T. Panduro

引用次数: 1