{"title":"Broken Stick Model for Irregular Longitudinal Data","authors":"S. Buuren","doi":"10.18637/jss.v106.i07","DOIUrl":"https://doi.org/10.18637/jss.v106.i07","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"106 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67679228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DataFrames.jl is a package written for and in the Julia language offering flexible and efficient handling of tabular data sets in memory. Thanks to Julia's unique strengths, it provides an appealing set of features: Rich support for standard data processing tasks and excellent flexibility and efficiency for more advanced and non-standard operations. We present the fundamental design of the package and how it compares with implementations of data frames in other languages, its main features, performance, and possible extensions. We conclude with a practical illustration of typical data processing operations.
{"title":"<b>DataFrames.jl</b>: Flexible and Fast Tabular Data in <i>Julia</i>","authors":"Milan Bouchet-Valat, Bogumi Kamiński","doi":"10.18637/jss.v107.i04","DOIUrl":"https://doi.org/10.18637/jss.v107.i04","url":null,"abstract":"DataFrames.jl is a package written for and in the Julia language offering flexible and efficient handling of tabular data sets in memory. Thanks to Julia's unique strengths, it provides an appealing set of features: Rich support for standard data processing tasks and excellent flexibility and efficiency for more advanced and non-standard operations. We present the fundamental design of the package and how it compares with implementations of data frames in other languages, its main features, performance, and possible extensions. We conclude with a practical illustration of typical data processing operations.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135653276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Garre, Jeroen Koomen, Heidy M. W. den Besten, Marcel H. Zwietering
The growth of populations is of interest in a broad variety of fields, such as epidemiology, economics or biology. Although a large variety of growth models are available in the scientific literature, their application usually requires advanced knowledge of mathematical programming and statistical inference, especially when modelling growth under dynamic environmental conditions. This article presents the biogrowth package for R, which implements functions for modelling the growth of populations. It can predict growth under static or dynamic environments, considering the effect of an arbitrary number of environmental factors. Moreover, it can be used to fit growth models to data gathered under static or dynamic environmental conditions. The package allows the user to fix any model parameter prior to the fit, an approach that can mitigate identifiability issues associated to growth models. The package includes common S3 methods for visualization and statistical analysis (summary of the fit, predictions, . . . ), easing result interpretation. It also includes functions for model comparison/selection. We illustrate the functions in biogrowth using examples from food science and economy.
{"title":"Modeling Population Growth in <i>R</i> with the <b>biogrowth</b> Package","authors":"Alberto Garre, Jeroen Koomen, Heidy M. W. den Besten, Marcel H. Zwietering","doi":"10.18637/jss.v107.i01","DOIUrl":"https://doi.org/10.18637/jss.v107.i01","url":null,"abstract":"The growth of populations is of interest in a broad variety of fields, such as epidemiology, economics or biology. Although a large variety of growth models are available in the scientific literature, their application usually requires advanced knowledge of mathematical programming and statistical inference, especially when modelling growth under dynamic environmental conditions. This article presents the biogrowth package for R, which implements functions for modelling the growth of populations. It can predict growth under static or dynamic environments, considering the effect of an arbitrary number of environmental factors. Moreover, it can be used to fit growth models to data gathered under static or dynamic environmental conditions. The package allows the user to fix any model parameter prior to the fit, an approach that can mitigate identifiability issues associated to growth models. The package includes common S3 methods for visualization and statistical analysis (summary of the fit, predictions, . . . ), easing result interpretation. It also includes functions for model comparison/selection. We illustrate the functions in biogrowth using examples from food science and economy.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135312289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Covariate-adaptive randomization is gaining popularity in clinical trials because they enable the generation of balanced allocations with respect to covariates. Over the past decade, substantial progress has been made in both new innovative randomization procedures and the theoretical properties of associated inferences. However, these results are scattered across the literature, and a single tool kit does not exist for use by clinical trial practitioners and researchers to conduct and evaluate these methods. The R package carat is proposed to address this need. It facilitates a broad range of covariate-adaptive randomization and testing procedures, such as the most common and classical methods, and also reflects recent developments in the field. The package contains comprehensive evaluation and comparison tools for use in both randomization procedures and tests. This enables power analysis to be conducted to assist the planning of a covariate-adaptive clinical trial. The package also implements a command-line interface to allow for an interactive allocation procedure, which is typically the case in real-world applications. In this paper, the features and functionalities of carat are presented.
{"title":"<b>carat</b>: An <i>R</i> Package for Covariate-Adaptive Randomization in Clinical Trials","authors":"Wei Ma, Xiaoqing Ye, Fuyi Tu, Feifang Hu","doi":"10.18637/jss.v107.i02","DOIUrl":"https://doi.org/10.18637/jss.v107.i02","url":null,"abstract":"Covariate-adaptive randomization is gaining popularity in clinical trials because they enable the generation of balanced allocations with respect to covariates. Over the past decade, substantial progress has been made in both new innovative randomization procedures and the theoretical properties of associated inferences. However, these results are scattered across the literature, and a single tool kit does not exist for use by clinical trial practitioners and researchers to conduct and evaluate these methods. The R package carat is proposed to address this need. It facilitates a broad range of covariate-adaptive randomization and testing procedures, such as the most common and classical methods, and also reflects recent developments in the field. The package contains comprehensive evaluation and comparison tools for use in both randomization procedures and tests. This enables power analysis to be conducted to assist the planning of a covariate-adaptive clinical trial. The package also implements a command-line interface to allow for an interactive allocation procedure, which is typically the case in real-world applications. In this paper, the features and functionalities of carat are presented.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135312292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Augugliaro, G. Sottile, E. C. Wit, V. Vinciotti
Sparse graphical models have revolutionized multivariate inference. With the advent of high-dimensional multivariate data in many applied fields, these methods are able to detect a much lower-dimensional structure, often represented via a sparse conditional independence graph. There have been numerous extensions of such methods in the past decade. Many practical applications have additional covariates or suffer from missing or censored data. Despite the development of these extensions of sparse inference methods for graphical models, there have been so far no implementations for, e.g., conditional graphical models. Here we present the general-purpose package cglasso for estimating sparse conditional Gaussian graphical models with potentially missing or censored data. The method employs an efficient expectation-maximization estimation of an l1-penalized likelihood via a block-coordinate descent algorithm. The package has a user-friendly data manipulation interface. It estimates a solution path and includes various automatic selection algorithms for the two l1 tuning parameters, associated with the sparse precision matrix and sparse regression coefficients, respectively. The package pays particular attention to the visualization of the results, both by means of marginal tables and figures, and of the inferred conditional independence graphs. This package provides a unique and computational efficient implementation of a conditional Gaussian graphical model that is able to deal with the additional complications of missing and censored data. As such it constitutes an important contribution for empirical scientists wishing to detect sparse structures in high-dimensional data.
{"title":"cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values","authors":"L. Augugliaro, G. Sottile, E. C. Wit, V. Vinciotti","doi":"10.18637/jss.v105.i01","DOIUrl":"https://doi.org/10.18637/jss.v105.i01","url":null,"abstract":"Sparse graphical models have revolutionized multivariate inference. With the advent of high-dimensional multivariate data in many applied fields, these methods are able to detect a much lower-dimensional structure, often represented via a sparse conditional independence graph. There have been numerous extensions of such methods in the past decade. Many practical applications have additional covariates or suffer from missing or censored data. Despite the development of these extensions of sparse inference methods for graphical models, there have been so far no implementations for, e.g., conditional graphical models. Here we present the general-purpose package cglasso for estimating sparse conditional Gaussian graphical models with potentially missing or censored data. The method employs an efficient expectation-maximization estimation of an l1-penalized likelihood via a block-coordinate descent algorithm. The package has a user-friendly data manipulation interface. It estimates a solution path and includes various automatic selection algorithms for the two l1 tuning parameters, associated with the sparse precision matrix and sparse regression coefficients, respectively. The package pays particular attention to the visualization of the results, both by means of marginal tables and figures, and of the inferred conditional independence graphs. This package provides a unique and computational efficient implementation of a conditional Gaussian graphical model that is able to deal with the additional complications of missing and censored data. As such it constitutes an important contribution for empirical scientists wishing to detect sparse structures in high-dimensional data.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"68 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86751288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models for right-censored data. We further extend the reach of the elastic net-regularized regression to all generalized linear model families, Cox models with (start, stop] data and strata, and a simplified version of the relaxed lasso. We also discuss convenient utility functions for measuring the performance of these fitted models.
{"title":"Elastic Net Regularization Paths for All Generalized Linear Models.","authors":"J Kenneth Tay, Balasubramanian Narasimhan, Trevor Hastie","doi":"10.18637/jss.v106.i01","DOIUrl":"10.18637/jss.v106.i01","url":null,"abstract":"<p><p>The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models for right-censored data. We further extend the reach of the elastic net-regularized regression to all generalized linear model families, Cox models with (start, stop] data and strata, and a simplified version of the relaxed lasso. We also discuss convenient utility functions for measuring the performance of these fitted models.</p>","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"106 ","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153598/pdf/nihms-1843576.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9776933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop an R package panelView and a Stata package panelview for panel data visualization. They are designed to assist causal analysis with panel data and have three main functionalities: (1) They plot the treatment status and missing values in a panel dataset; (2) they visualize the temporal dynamics of the main variables of interest; and (3) they depict the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. These tools can help researchers better understand their panel datasets before conducting statistical analysis.
{"title":"Panel Data Visualization in <i>R</i> (<b>panelView</b>) and <i>Stata</i> (<b>panelview</b>)","authors":"Hongyu Mou, Licheng Liu, Yiqing Xu","doi":"10.18637/jss.v107.i07","DOIUrl":"https://doi.org/10.18637/jss.v107.i07","url":null,"abstract":"We develop an R package panelView and a Stata package panelview for panel data visualization. They are designed to assist causal analysis with panel data and have three main functionalities: (1) They plot the treatment status and missing values in a panel dataset; (2) they visualize the temporal dynamics of the main variables of interest; and (3) they depict the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. These tools can help researchers better understand their panel datasets before conducting statistical analysis.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the huge availability of software to estimate cross-sectional spatial models, there are only few functions to estimate models dealing with spatial limited dependent variable. This paper fills this gap introducing the new R package spldv. The package is based on generalized methods of moment (GMM) estimators and includes a series of one- and two-step estimators based on different choices of the weighting matrix for the moments conditions in the first step, and different estimators for the variance-covariance matrix of the estimated coefficients. An important feature of spldv is that users can estimate the spatial Durbin model and compute the direct, indirect, and total effects in a friendly and flexible way.
{"title":"GMM Estimators for Binary Spatial Models in <i>R</i>","authors":"Gianfranco Piras, Mauricio Sarrias","doi":"10.18637/jss.v107.i08","DOIUrl":"https://doi.org/10.18637/jss.v107.i08","url":null,"abstract":"Despite the huge availability of software to estimate cross-sectional spatial models, there are only few functions to estimate models dealing with spatial limited dependent variable. This paper fills this gap introducing the new R package spldv. The package is based on generalized methods of moment (GMM) estimators and includes a series of one- and two-step estimators based on different choices of the weighting matrix for the moments conditions in the first step, and different estimators for the variance-covariance matrix of the estimated coefficients. An important feature of spldv is that users can estimate the spatial Durbin model and compute the direct, indirect, and total effects in a friendly and flexible way.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136259245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.
{"title":"Efficient Multiple Imputation for Diverse Data in <i>Python</i> and <i>R</i>: <b>MIDASpy</b> and <b>rMIDAS</b>","authors":"Ranjit Lall, Thomas Robinson","doi":"10.18637/jss.v107.i09","DOIUrl":"https://doi.org/10.18637/jss.v107.i09","url":null,"abstract":"This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136374652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces the logitr R package for fast maximum likelihood estimation of multinomial logit and mixed logit models with unobserved heterogeneity across individuals, which is modeled by allowing parameters to vary randomly over individuals according to a chosen distribution. The package is faster than other similar packages such as mlogit, gmnl, mixl, and apollo, and it supports utility models specified with"preference space"or"willingness to pay (WTP) space"parameterizations, allowing for the direct estimation of marginal WTP. The typical procedure of computing WTP post-estimation using a preference space model can lead to unreasonable distributions of WTP across the population in mixed logit models. The paper provides a discussion of some of the implications of each utility parameterization for WTP estimates. It also highlights some of the design features that enable logitr's performant estimation speed and includes a benchmarking exercise with similar packages. Finally, the paper highlights additional features that are designed specifically for WTP space models, including a consistent user interface for specifying models in either space and a parallelized multi-start optimization loop, which is particularly useful for searching the solution space for different local minima when estimating models with non-convex log-likelihood functions.
{"title":"logitr: Fast Estimation of Multinomial and Mixed Logit Models with Preference Space and Willingness-to-Pay Space Utility Parameterizations","authors":"J. Helveston","doi":"10.18637/jss.v105.i10","DOIUrl":"https://doi.org/10.18637/jss.v105.i10","url":null,"abstract":"This paper introduces the logitr R package for fast maximum likelihood estimation of multinomial logit and mixed logit models with unobserved heterogeneity across individuals, which is modeled by allowing parameters to vary randomly over individuals according to a chosen distribution. The package is faster than other similar packages such as mlogit, gmnl, mixl, and apollo, and it supports utility models specified with\"preference space\"or\"willingness to pay (WTP) space\"parameterizations, allowing for the direct estimation of marginal WTP. The typical procedure of computing WTP post-estimation using a preference space model can lead to unreasonable distributions of WTP across the population in mixed logit models. The paper provides a discussion of some of the implications of each utility parameterization for WTP estimates. It also highlights some of the design features that enable logitr's performant estimation speed and includes a benchmarking exercise with similar packages. Finally, the paper highlights additional features that are designed specifically for WTP space models, including a consistent user interface for specifying models in either space and a parallelized multi-start optimization loop, which is particularly useful for searching the solution space for different local minima when estimating models with non-convex log-likelihood functions.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"15 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79549247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}