首页 > 最新文献

R Journal最新文献

英文 中文
glmmPen: High Dimensional Penalized Generalized Linear Mixed Models. glmmPen:高维惩罚性广义线性混合模型。
IF 2.3 4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 Epub Date: 2024-04-10 DOI: 10.32614/rj-2023-086
Hillary M Heiling, Naim U Rashid, Quefeng Li, Joseph G Ibrahim

Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process, where model misspecification may lead to significant bias. However, the joint selection of fixed and random effects has historically been limited to lower dimensional GLMMs, largely due to the use of criterion-based model selection strategies. Here we present the R package glmmPen, one of the first to select fixed and random effects in higher dimension using a penalized GLMM modeling framework. Model parameters are estimated using a Monte Carlo expectation conditional minimization (MCECM) algorithm, which leverages Stan and RcppArmadillo for increased computational efficiency. Our package supports the Binomial, Gaussian, and Poisson families and multiple penalty functions. In this manuscript we discuss the modeling procedure, estimation scheme, and software implementation through application to a pancreatic cancer subtyping study. Simulation results show our method has good performance in selecting both the fixed and random effects in high dimensional GLMMs.

广义线性混合模型(GLMM)能够对非高斯条件分布的相关结果进行建模,因此在研究中得到广泛应用。正确选择固定效应和随机效应是建模过程的关键部分,模型的错误规范可能会导致重大偏差。然而,固定效应和随机效应的联合选择历来仅限于低维 GLMM,这主要是由于使用了基于准则的模型选择策略。在此,我们介绍 R 软件包 glmmPen,它是首批使用惩罚性 GLMM 建模框架在较高维度上选择固定效应和随机效应的软件包之一。模型参数使用蒙特卡罗期望条件最小化(MCECM)算法进行估计,该算法利用 Stan 和 RcppArmadillo 提高了计算效率。我们的软件包支持二叉族、高斯族和泊松族以及多种惩罚函数。在本手稿中,我们通过应用于胰腺癌亚型研究,讨论了建模程序、估计方案和软件实现。仿真结果表明,我们的方法在选择高维 GLMM 的固定效应和随机效应方面都有良好的表现。
{"title":"glmmPen: High Dimensional Penalized Generalized Linear Mixed Models.","authors":"Hillary M Heiling, Naim U Rashid, Quefeng Li, Joseph G Ibrahim","doi":"10.32614/rj-2023-086","DOIUrl":"10.32614/rj-2023-086","url":null,"abstract":"<p><p>Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process, where model misspecification may lead to significant bias. However, the joint selection of fixed and random effects has historically been limited to lower dimensional GLMMs, largely due to the use of criterion-based model selection strategies. Here we present the R package glmmPen, one of the first to select fixed and random effects in higher dimension using a penalized GLMM modeling framework. Model parameters are estimated using a Monte Carlo expectation conditional minimization (MCECM) algorithm, which leverages Stan and RcppArmadillo for increased computational efficiency. Our package supports the Binomial, Gaussian, and Poisson families and multiple penalty functions. In this manuscript we discuss the modeling procedure, estimation scheme, and software implementation through application to a pancreatic cancer subtyping study. Simulation results show our method has good performance in selecting both the fixed and random effects in high dimensional GLMMs.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"15 4","pages":"106-128"},"PeriodicalIF":2.3,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11138212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141181494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
binGroup2: Statistical Tools for Infection Identification via Group Testing. binGroup2:通过分组测试进行感染识别的统计工具。
IF 2.1 4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 Epub Date: 2024-04-10 DOI: 10.32614/rj-2023-081
Christopher R Bilder, Brianna D Hitt, Brad J Biggerstaff, Joshua M Tebbs, Christopher S McMahan

Group testing is the process of testing items as an amalgamation, rather than separately, to determine the binary status for each item. Its use was especially important during the COVID-19 pandemic through testing specimens for SARS-CoV-2. The adoption of group testing for this and many other applications is because members of a negative testing group can be declared negative with potentially only one test. This subsequently leads to significant increases in laboratory testing capacity. Whenever a group testing algorithm is put into practice, it is critical for laboratories to understand the algorithm's operating characteristics, such as the expected number of tests. Our paper presents the binGroup2 package that provides the statistical tools for this purpose. This R package is the first to address the identification aspect of group testing for a wide variety of algorithms. We illustrate its use through COVID-19 and chlamydia/gonorrhea applications of group testing.

分组检测是将检测项目合并而不是单独进行检测,以确定每个项目的二进制状态。在 COVID-19 大流行期间,通过对标本进行 SARS-CoV-2 检测,分组检测的使用尤为重要。在这一应用和许多其他应用中采用分组检测的原因是,阴性检测组的成员可能只需一次检测就可宣布为阴性。这就大大提高了实验室的检测能力。无论何时将分组检测算法付诸实践,实验室都必须了解该算法的运行特性,如预期的检测次数。本文介绍的 binGroup2 软件包为此提供了统计工具。该 R 软件包是首个针对各种算法的分组测试识别问题的软件包。我们通过 COVID-19 和衣原体/淋病的分组测试应用来说明其用途。
{"title":"binGroup2: Statistical Tools for Infection Identification via Group Testing.","authors":"Christopher R Bilder, Brianna D Hitt, Brad J Biggerstaff, Joshua M Tebbs, Christopher S McMahan","doi":"10.32614/rj-2023-081","DOIUrl":"10.32614/rj-2023-081","url":null,"abstract":"<p><p>Group testing is the process of testing items as an amalgamation, rather than separately, to determine the binary status for each item. Its use was especially important during the COVID-19 pandemic through testing specimens for SARS-CoV-2. The adoption of group testing for this and many other applications is because members of a negative testing group can be declared negative with potentially only one test. This subsequently leads to significant increases in laboratory testing capacity. Whenever a group testing algorithm is put into practice, it is critical for laboratories to understand the algorithm's operating characteristics, such as the expected number of tests. Our paper presents the binGroup2 package that provides the statistical tools for this purpose. This R package is the first to address the identification aspect of group testing for a wide variety of algorithms. We illustrate its use through COVID-19 and chlamydia/gonorrhea applications of group testing.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"15 4","pages":"21-36"},"PeriodicalIF":2.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11139028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141181492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-Way Correspondence Analysis in R R中的三向对应分析
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-09 DOI: 10.32614/rj-2023-049
Rosaria Lombardo, Michel van de Velden, Eric J. Beh
Three-way correspondence analysis is a suitable multivariate method for visualising the association in three-way categorical data, modelling the global dependence, or reducing dimensionality. This paper provides a description of an R package for performing three-way correspondence analysis: CA3variants. The functions in this package allow the analyst to perform several variations of this analysis, depending on the research question being posed and/or the properties underlying the data. Users can opt for the classical (symmetrical) approach or the non-symmetric variant - the latter is particularly useful if one of the three categorical variables is treated as a response variable. In addition, to perform the necessary three-way decompositions, a Tucker3 and a trivariate moment decomposition (using orthogonal polynomials) can be utilized. The Tucker3 method of decomposition can be used when one or more of the categorical variables is nominal while for ordinal variables the trivariate moment decomposition can be used. The package also provides a function that can be used to choose the model dimensionality.
三向对应分析是一种适用于三向分类数据关联可视化、全局依赖性建模或降维的多变量分析方法。本文提供了一个用于执行三向通信分析的R包的描述:ca3变体。此包中的功能允许分析人员根据所提出的研究问题和/或数据背后的属性执行此分析的几种变体。用户可以选择经典(对称)方法或非对称变体——如果将三个分类变量中的一个作为响应变量,后者特别有用。此外,为了执行必要的三向分解,可以使用Tucker3和三元矩分解(使用正交多项式)。当一个或多个分类变量是名义变量时,可以使用Tucker3分解方法,而对于有序变量,可以使用三元矩分解方法。该包还提供了一个可用于选择模型维度的函数。
{"title":"Three-Way Correspondence Analysis in R","authors":"Rosaria Lombardo, Michel van de Velden, Eric J. Beh","doi":"10.32614/rj-2023-049","DOIUrl":"https://doi.org/10.32614/rj-2023-049","url":null,"abstract":"Three-way correspondence analysis is a suitable multivariate method for visualising the association in three-way categorical data, modelling the global dependence, or reducing dimensionality. This paper provides a description of an R package for performing three-way correspondence analysis: CA3variants. The functions in this package allow the analyst to perform several variations of this analysis, depending on the research question being posed and/or the properties underlying the data. Users can opt for the classical (symmetrical) approach or the non-symmetric variant - the latter is particularly useful if one of the three categorical variables is treated as a response variable. In addition, to perform the necessary three-way decompositions, a Tucker3 and a trivariate moment decomposition (using orthogonal polynomials) can be utilized. The Tucker3 method of decomposition can be used when one or more of the categorical variables is nominal while for ordinal variables the trivariate moment decomposition can be used. The package also provides a function that can be used to choose the model dimensionality.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":" 30","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135293173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
nlstac: Non-Gradient Separable Nonlinear Least Squares Fitting nlstac:非梯度可分非线性最小二乘拟合
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-08 DOI: 10.32614/rj-2023-040
J. A. F. Torvisco, R. Benítez, M. R. Arias, J. Cabello Sánchez
A new package for nonlinear least squares fitting is introduced in this paper. This package implements a recently developed algorithm that, for certain types of nonlinear curve fitting, reduces the number of nonlinear parameters to be fitted. One notable feature of this method is the absence of initialization which is typically necessary for nonlinear fitting gradient-based algorithms. Instead, just some bounds for the nonlinear parameters are required. Even though convergence for this method is guaranteed for exponential decay using the max-norm, the algorithm exhibits remarkable robustness, and its use has been extended to a wide range of functions using the Euclidean norm. Furthermore, this data-fitting package can also serve as a valuable resource for providing accurate initial parameters to other algorithms that rely on them.
介绍了一种新的非线性最小二乘拟合方法。这个包实现了一个最近开发的算法,对于某些类型的非线性曲线拟合,减少了要拟合的非线性参数的数量。该方法的一个显著特点是不需要初始化,而初始化对于基于梯度的非线性拟合算法来说是非常必要的。相反,只需要非线性参数的一些界限。尽管使用最大范数保证了该方法对指数衰减的收敛性,但该算法显示出显著的鲁棒性,并且它的使用已扩展到使用欧几里得范数的广泛函数。此外,这个数据拟合包也可以作为一个宝贵的资源,为依赖于它们的其他算法提供准确的初始参数。
{"title":"nlstac: Non-Gradient Separable Nonlinear Least Squares Fitting","authors":"J. A. F. Torvisco, R. Benítez, M. R. Arias, J. Cabello Sánchez","doi":"10.32614/rj-2023-040","DOIUrl":"https://doi.org/10.32614/rj-2023-040","url":null,"abstract":"A new package for nonlinear least squares fitting is introduced in this paper. This package implements a recently developed algorithm that, for certain types of nonlinear curve fitting, reduces the number of nonlinear parameters to be fitted. One notable feature of this method is the absence of initialization which is typically necessary for nonlinear fitting gradient-based algorithms. Instead, just some bounds for the nonlinear parameters are required. Even though convergence for this method is guaranteed for exponential decay using the max-norm, the algorithm exhibits remarkable robustness, and its use has been extended to a wide range of functions using the Euclidean norm. Furthermore, this data-fitting package can also serve as a valuable resource for providing accurate initial parameters to other algorithms that rely on them.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"55 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Workflow for Estimating and Visualising Excess Mortality During the COVID-19 Pandemic COVID-19大流行期间超额死亡率估算和可视化工作流程
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-08 DOI: 10.32614/rj-2023-055
Garyfallos Konstantinoudis, Virgilio Gómez-Rubio, Michela Cameletti, Monica Pirani, Gianluca Baio, Marta Blangiardo
COVID-19 related deaths estimates underestimate the pandemic burden on mortality because they suffer from completeness and accuracy issues. Excess mortality is a popular alternative, as it compares the observed number of deaths versus the number that would be expected if the pandemic did not occur. The expected number of deaths depends on population trends, temperature, and spatio-temporal patterns. In addition to this, high geographical resolution is required to examine within country trends and the effectiveness of the different public health policies. In this tutorial, we propose a workflow using R for estimating and visualising excess mortality at high geographical resolution. We show a case study estimating excess deaths during 2020 in Italy. The proposed workflow is fast to implement and allows for combining different models and presenting aggregated results based on factors such as age, sex, and spatial location. This makes it a particularly powerful and appealing workflow for online monitoring of the pandemic burden and timely policy making.
与COVID-19相关的死亡估计数低估了大流行对死亡率造成的负担,因为它们存在完整性和准确性问题。过高死亡率是一种流行的替代方法,因为它将观察到的死亡人数与未发生大流行的预期死亡人数进行比较。预计死亡人数取决于人口趋势、温度和时空格局。除此之外,在审查国家内部的趋势和不同公共卫生政策的有效性方面,需要高度的地理分辨率。在本教程中,我们提出了一个使用R的工作流,用于在高地理分辨率下估计和可视化超额死亡率。我们展示了一个案例研究,估计2020年意大利的超额死亡人数。所建议的工作流可以快速实现,并且允许组合不同的模型,并根据年龄、性别和空间位置等因素显示聚合的结果。这使其成为在线监测大流行负担和及时制定政策的特别强大和有吸引力的工作流程。
{"title":"A Workflow for Estimating and Visualising Excess Mortality During the COVID-19 Pandemic","authors":"Garyfallos Konstantinoudis, Virgilio Gómez-Rubio, Michela Cameletti, Monica Pirani, Gianluca Baio, Marta Blangiardo","doi":"10.32614/rj-2023-055","DOIUrl":"https://doi.org/10.32614/rj-2023-055","url":null,"abstract":"COVID-19 related deaths estimates underestimate the pandemic burden on mortality because they suffer from completeness and accuracy issues. Excess mortality is a popular alternative, as it compares the observed number of deaths versus the number that would be expected if the pandemic did not occur. The expected number of deaths depends on population trends, temperature, and spatio-temporal patterns. In addition to this, high geographical resolution is required to examine within country trends and the effectiveness of the different public health policies. In this tutorial, we propose a workflow using R for estimating and visualising excess mortality at high geographical resolution. We show a case study estimating excess deaths during 2020 in Italy. The proposed workflow is fast to implement and allows for combining different models and presenting aggregated results based on factors such as age, sex, and spatial location. This makes it a particularly powerful and appealing workflow for online monitoring of the pandemic burden and timely policy making.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"55 s63","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Heteroskedastic and Instrumental Variable Models for Binary Outcome Variables in R 估计二元结果变量的异方差和工具变量模型
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-08 DOI: 10.32614/rj-2023-050
Mauricio Sarrias
The objective of this article is to introduce the package Rchoice which provides functionality for estimating heteroskedastic and instrumental variable models for binary outcomes, whith emphasis on the calculation of the average marginal effects. To do so, I introduce two new functions of the Rchoice package using widely known applied examples. I also show how users can generate publication-ready tables of regression model estimates.
本文的目的是介绍软件包Rchoice,它提供了估计二元结果的异方差和工具变量模型的功能,重点是平均边际效应的计算。为此,我将使用广为人知的应用示例介绍Rchoice包的两个新函数。我还展示了用户如何生成回归模型估计的可供发布的表。
{"title":"Estimating Heteroskedastic and Instrumental Variable Models for Binary Outcome Variables in R","authors":"Mauricio Sarrias","doi":"10.32614/rj-2023-050","DOIUrl":"https://doi.org/10.32614/rj-2023-050","url":null,"abstract":"The objective of this article is to introduce the package Rchoice which provides functionality for estimating heteroskedastic and instrumental variable models for binary outcomes, whith emphasis on the calculation of the average marginal effects. To do so, I introduce two new functions of the Rchoice package using widely known applied examples. I also show how users can generate publication-ready tables of regression model estimates.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"55 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Estimating Equations using the new R package glmtoolbox 使用新R包glmtoolbox的广义估计方程
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-01 DOI: 10.32614/rj-2023-056
L.H. Vanegas, L.M. Rondón, G.A. Paula
This paper introduces a very comprehensive implementation, available in the new `R` package `glmtoolbox`, of a very flexible statistical tool known as Generalized Estimating Equations (GEE), which analyzes cluster correlated data utilizing marginal models. As well as providing more built-in structures for the working correlation matrix than other GEE implementations in `R`, this GEE implementation also allows the user to: $(1)$ compute several estimates of the variance-covariance matrix of the estimators of the parameters of interest; $(2)$ compute several criteria to assist the selection of the structure for the working-correlation matrix; $(3)$ compare nested models using the Wald test as well as the generalized score test; $(4)$ assess the goodness-of-fit of the model using Pearson-, deviance- and Mahalanobis-type residuals; $(5)$ perform sensibility analysis using the global influence approach (that is, dfbeta statistic and Cook's distance) as well as the local influence approach; $(6)$ use several criteria to perform variable selection using a hybrid stepwise procedure; $(7)$ fit models with nonlinear predictors; $(8)$ handle dropout-type missing data under MAR rather than MCAR assumption by using observation-specific or cluster-specific weighted methods. The capabilities of this GEE implementation are illustrated by analyzing four real datasets obtained from longitudinal studies.
本文介绍了一个非常全面的实现,可以在新的“R”包“glmtoolbox”中获得,这是一个非常灵活的统计工具,称为广义估计方程(GEE),它利用边际模型分析聚类相关数据。除了为工作相关矩阵提供比' R '中的其他GEE实现更多的内置结构外,这个GEE实现还允许用户:$(1)$计算感兴趣参数估计量的方差-协方差矩阵的几个估计;$(2)$计算若干准则以协助选择工作相关矩阵的结构;$(3)$比较嵌套模型,使用Wald检验和广义分数检验;(4)使用Pearson、deviance和mahalanobis型残差评估模型的拟合优度;$(5)$使用全局影响方法(即dfbeta统计量和库克距离)和局部影响方法进行敏感性分析;$(6)$使用几个标准来执行变量选择,使用混合逐步过程;$(7)$非线性拟合模型;$(8)$使用特定于观测值或特定于聚类的加权方法处理MAR而不是MCAR假设下的辍学型缺失数据。通过分析从纵向研究中获得的四个真实数据集,说明了这种GEE实现的能力。
{"title":"Generalized Estimating Equations using the new R package glmtoolbox","authors":"L.H. Vanegas, L.M. Rondón, G.A. Paula","doi":"10.32614/rj-2023-056","DOIUrl":"https://doi.org/10.32614/rj-2023-056","url":null,"abstract":"This paper introduces a very comprehensive implementation, available in the new `R` package `glmtoolbox`, of a very flexible statistical tool known as Generalized Estimating Equations (GEE), which analyzes cluster correlated data utilizing marginal models. As well as providing more built-in structures for the working correlation matrix than other GEE implementations in `R`, this GEE implementation also allows the user to: $(1)$ compute several estimates of the variance-covariance matrix of the estimators of the parameters of interest; $(2)$ compute several criteria to assist the selection of the structure for the working-correlation matrix; $(3)$ compare nested models using the Wald test as well as the generalized score test; $(4)$ assess the goodness-of-fit of the model using Pearson-, deviance- and Mahalanobis-type residuals; $(5)$ perform sensibility analysis using the global influence approach (that is, dfbeta statistic and Cook's distance) as well as the local influence approach; $(6)$ use several criteria to perform variable selection using a hybrid stepwise procedure; $(7)$ fit models with nonlinear predictors; $(8)$ handle dropout-type missing data under MAR rather than MCAR assumption by using observation-specific or cluster-specific weighted methods. The capabilities of this GEE implementation are illustrated by analyzing four real datasets obtained from longitudinal studies.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"107 5-6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taking the Scenic Route: Interactive and Performant Tour Animations 走风景路线:交互式和高性能的游览动画
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-01 DOI: 10.32614/rj-2023-052
Casper Hart, Earo Wang
The tour provides a useful vehicle for exploring high dimensional datasets. It works by combining a sequence of projections---the tour path---in to an animation---the display method. Current display implementations in R are limited in their interactivity and portability, and give poor performance and jerky animations even for small datasets. We take a detour into web technologies, such as Three.js and WebGL, to support smooth and performant tour visualisations. The R package detourr implements a set of display tools that allow for rich interactions (including orbit controls, scrubbing, and brushing) and smooth animations for large datasets. It provides a declarative R interface which is accessible to new users, and it supports linked views using crosstalk and shiny. The resulting animations are portable across a wide range of browsers and devices. We also extend the radial transformation of the Sage Tour (@laa2021burning) to 3 or more dimensions with an implementation in 3D, and provide a simplified implementation of the Slice Tour (@laa2020slice).
该导览为探索高维数据集提供了一个有用的工具。它的工作原理是将一系列投影(游览路径)与动画(显示方法)相结合。当前R中的显示实现在交互性和可移植性方面受到限制,即使对于小数据集,也会给出较差的性能和不稳定的动画。我们绕道进入web技术,如Three.js和WebGL,以支持流畅和高性能的旅游可视化。R包实现了一组显示工具,允许丰富的交互(包括轨道控制、擦除和刷刷)和大型数据集的平滑动画。它提供了一个新用户可以访问的声明式R接口,并且它支持使用串扰和闪亮的链接视图。生成的动画可以在各种浏览器和设备上移植。我们还将Sage Tour (@laa2021burning)的径向变换扩展到三维或三维以上,并提供了Slice Tour (@laa2020slice)的简化实现。
{"title":"Taking the Scenic Route: Interactive and Performant Tour Animations","authors":"Casper Hart, Earo Wang","doi":"10.32614/rj-2023-052","DOIUrl":"https://doi.org/10.32614/rj-2023-052","url":null,"abstract":"The tour provides a useful vehicle for exploring high dimensional datasets. It works by combining a sequence of projections---the tour path---in to an animation---the display method. Current display implementations in R are limited in their interactivity and portability, and give poor performance and jerky animations even for small datasets. We take a detour into web technologies, such as Three.js and WebGL, to support smooth and performant tour visualisations. The R package detourr implements a set of display tools that allow for rich interactions (including orbit controls, scrubbing, and brushing) and smooth animations for large datasets. It provides a declarative R interface which is accessible to new users, and it supports linked views using crosstalk and shiny. The resulting animations are portable across a wide range of browsers and devices. We also extend the radial transformation of the Sage Tour (@laa2021burning) to 3 or more dimensions with an implementation in 3D, and provide a simplified implementation of the Slice Tour (@laa2020slice).","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"107 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
hydrotoolbox, a Package for Hydrometeorological Data Management 水文气象数据管理软件包hydrotoolbox
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-01 DOI: 10.32614/rj-2023-041
Ezequiel Toum, Pierre Pitte
The hydrometeorological data provided by federal agencies, research groups and private companies tend to be heterogeneous: records are kept in different formats, quality control processes are not standardized and may even vary within a given agency, variables are not always recorded with the same temporal resolution, and there are data gaps and incorrectly recorded values. Once these problems are dealt with, it is useful to have tools to safely store and manipulate the series, providing temporal aggregation, interactive visualization for analysis, static graphics to publish and/or communicate results, techniques to correct and/or modify the series, among others. Here we introduce a package written in the R language using object-oriented programming and designed to accomplish these objectives, giving to the user a general framework for working with any kind of hydrometeorological series. We present the package design, its strengths, limitations and show its application for two real cases.
联邦机构、研究小组和私营公司提供的水文气象数据往往是异质的:记录以不同的格式保存,质量控制过程没有标准化,甚至在给定的机构内可能有所不同,变量并不总是以相同的时间分辨率记录,并且存在数据空白和错误记录的值。一旦处理了这些问题,拥有安全存储和操作序列的工具就很有用了,这些工具可以提供时间聚合、用于分析的交互式可视化、用于发布和/或交流结果的静态图形、用于纠正和/或修改序列的技术等等。在这里,我们介绍一个用R语言编写的软件包,使用面向对象编程,旨在实现这些目标,为用户提供一个处理任何类型水文气象系列的通用框架。我们介绍了包装设计,它的优势和局限性,并展示了它在两个实际案例中的应用。
{"title":"hydrotoolbox, a Package for Hydrometeorological Data Management","authors":"Ezequiel Toum, Pierre Pitte","doi":"10.32614/rj-2023-041","DOIUrl":"https://doi.org/10.32614/rj-2023-041","url":null,"abstract":"The hydrometeorological data provided by federal agencies, research groups and private companies tend to be heterogeneous: records are kept in different formats, quality control processes are not standardized and may even vary within a given agency, variables are not always recorded with the same temporal resolution, and there are data gaps and incorrectly recorded values. Once these problems are dealt with, it is useful to have tools to safely store and manipulate the series, providing temporal aggregation, interactive visualization for analysis, static graphics to publish and/or communicate results, techniques to correct and/or modify the series, among others. Here we introduce a package written in the R language using object-oriented programming and designed to accomplish these objectives, giving to the user a general framework for working with any kind of hydrometeorological series. We present the package design, its strengths, limitations and show its application for two real cases.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"107 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian Mixture Models in R R中的高斯混合模型
4区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-01 DOI: 10.32614/rj-2023-043
Bastien Chassagnol, Antoine Bichat, Cheïma Boudjeniba, Pierre-Henri Wuillemin, Mickaël Guedj, David Gohel, Gregory Nuel, Etienne Becht
Gaussian mixture models (GMMs) are widely used for modelling stochastic problems. Indeed, a wide diversity of packages have been developed in R. However, no recent review describing the main features offered by these packages and comparing their performances has been performed. In this article, we first introduce GMMs and the EM algorithm used to retrieve the parameters of the model and analyse the main features implemented among seven of the most widely used R packages. We then empirically compare their statistical and computational performances in relation with the choice of the initialisation algorithm and the complexity of the mixture. We demonstrate that the best estimation with well-separated components or with a small number of components with distinguishable modes is obtained with REBMIX initialisation, implemented in the [rebmix](https://CRAN.R-project.org/package=rebmix) package, while the best estimation with highly overlapping components is obtained with *k*-means or random initialisation. Importantly, we show that implementation details in the EM algorithm yield differences in the parameters' estimation. Especially, packages [mixtools](https://CRAN.R-project.org/package=mixtools) (Young et al. 2020) and [Rmixmod](https://CRAN.R-project.org/package=Rmixmod) (Langrognet et al. 2021) estimate the parameters of the mixture with smaller bias, while the RMSE and variability of the estimates is smaller with packages [bgmm](https://CRAN.R-project.org/package=bgmm) (Ewa Szczurek 2021) , [EMCluster](https://CRAN.R-project.org/package=EMCluster) (W.-C. Chen and Maitra 2022) , [GMKMcharlie](https://CRAN.R-project.org/package=GMKMcharlie) (Liu 2021), [flexmix](https://CRAN.R-project.org/package=flexmix) (Gruen and Leisch 2022) and [mclust](https://CRAN.R-project.org/package=mclust) (Fraley, Raftery, and Scrucca 2022). The comparison of these packages provides R users with useful recommendations for improving the computational and statistical performance of their clustering and for identifying common deficiencies. Additionally, we propose several improvements in the development of a future, unified mixture model package.
高斯混合模型(GMMs)被广泛用于随机问题的建模。事实上,在r中已经开发了各种各样的包。然而,最近没有评论描述这些包提供的主要特性并比较它们的性能。在本文中,我们首先介绍了gmm和用于检索模型参数的EM算法,并分析了在七个最广泛使用的R包中实现的主要特征。然后,我们根据初始化算法的选择和混合的复杂性,经验地比较了它们的统计和计算性能。我们证明了在[REBMIX](https://CRAN.R-project.org/package=rebmix)包中实现的REBMIX初始化可以获得具有良好分离成分或具有可区分模式的少量成分的最佳估计,而使用*k*均值或随机初始化可以获得具有高度重叠成分的最佳估计。重要的是,我们证明了EM算法中的实现细节在参数估计中产生差异。特别是,软件包[mixtools](https://CRAN.R-project.org/package=mixtools) (Young等人,2020)和[Rmixmod](https://CRAN.R-project.org/package=Rmixmod) (Langrognet等人,2021)以较小的偏差估计混合物的参数,而软件包[bgmm](https://CRAN.R-project.org/package=bgmm) (Ewa Szczurek 2021), [EMCluster](https://CRAN.R-project.org/package=EMCluster) (w . c . c .)估计的RMSE和可变性较小。Chen and Maitra 2022), [GMKMcharlie](https://CRAN.R-project.org/package=GMKMcharlie) (Liu 2021), [flexmix](https://CRAN.R-project.org/package=flexmix) (Gruen and Leisch 2022)和[mclust](https://CRAN.R-project.org/package=mclust) (Fraley, Raftery, and Scrucca 2022)。这些包的比较为R用户提供了有用的建议,以改进其聚类的计算和统计性能,并识别常见的缺陷。此外,我们提出了未来统一混合模型包开发的几个改进。
{"title":"Gaussian Mixture Models in R","authors":"Bastien Chassagnol, Antoine Bichat, Cheïma Boudjeniba, Pierre-Henri Wuillemin, Mickaël Guedj, David Gohel, Gregory Nuel, Etienne Becht","doi":"10.32614/rj-2023-043","DOIUrl":"https://doi.org/10.32614/rj-2023-043","url":null,"abstract":"Gaussian mixture models (GMMs) are widely used for modelling stochastic problems. Indeed, a wide diversity of packages have been developed in R. However, no recent review describing the main features offered by these packages and comparing their performances has been performed. In this article, we first introduce GMMs and the EM algorithm used to retrieve the parameters of the model and analyse the main features implemented among seven of the most widely used R packages. We then empirically compare their statistical and computational performances in relation with the choice of the initialisation algorithm and the complexity of the mixture. We demonstrate that the best estimation with well-separated components or with a small number of components with distinguishable modes is obtained with REBMIX initialisation, implemented in the [rebmix](https://CRAN.R-project.org/package=rebmix) package, while the best estimation with highly overlapping components is obtained with *k*-means or random initialisation. Importantly, we show that implementation details in the EM algorithm yield differences in the parameters' estimation. Especially, packages [mixtools](https://CRAN.R-project.org/package=mixtools) (Young et al. 2020) and [Rmixmod](https://CRAN.R-project.org/package=Rmixmod) (Langrognet et al. 2021) estimate the parameters of the mixture with smaller bias, while the RMSE and variability of the estimates is smaller with packages [bgmm](https://CRAN.R-project.org/package=bgmm) (Ewa Szczurek 2021) , [EMCluster](https://CRAN.R-project.org/package=EMCluster) (W.-C. Chen and Maitra 2022) , [GMKMcharlie](https://CRAN.R-project.org/package=GMKMcharlie) (Liu 2021), [flexmix](https://CRAN.R-project.org/package=flexmix) (Gruen and Leisch 2022) and [mclust](https://CRAN.R-project.org/package=mclust) (Fraley, Raftery, and Scrucca 2022). The comparison of these packages provides R users with useful recommendations for improving the computational and statistical performance of their clustering and for identifying common deficiencies. Additionally, we propose several improvements in the development of a future, unified mixture model package.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"102 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
R Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1