首页 > 最新文献

Journal of Statistical Software最新文献

英文 中文
Broken Stick Model for Irregular Longitudinal Data 不规则纵向数据的断棒模型
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v106.i07
S. Buuren
{"title":"Broken Stick Model for Irregular Longitudinal Data","authors":"S. Buuren","doi":"10.18637/jss.v106.i07","DOIUrl":"https://doi.org/10.18637/jss.v106.i07","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"106 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67679228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DataFrames.jl: Flexible and Fast Tabular Data in Julia DataFrames。灵活和快速的表格数据在Julia
2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v107.i04
Milan Bouchet-Valat, Bogumi Kamiński
DataFrames.jl is a package written for and in the Julia language offering flexible and efficient handling of tabular data sets in memory. Thanks to Julia's unique strengths, it provides an appealing set of features: Rich support for standard data processing tasks and excellent flexibility and efficiency for more advanced and non-standard operations. We present the fundamental design of the package and how it compares with implementations of data frames in other languages, its main features, performance, and possible extensions. We conclude with a practical illustration of typical data processing operations.
DataFrames。jl是一个用Julia语言编写的包,它提供了对内存中的表格数据集的灵活高效的处理。由于Julia的独特优势,它提供了一组吸引人的特性:对标准数据处理任务的丰富支持,以及对更高级和非标准操作的出色灵活性和效率。我们介绍了这个包的基本设计,以及它与其他语言中的数据帧实现的比较,它的主要特性、性能和可能的扩展。最后,我们给出了典型数据处理操作的实际示例。
{"title":"<b>DataFrames.jl</b>: Flexible and Fast Tabular Data in <i>Julia</i>","authors":"Milan Bouchet-Valat, Bogumi Kamiński","doi":"10.18637/jss.v107.i04","DOIUrl":"https://doi.org/10.18637/jss.v107.i04","url":null,"abstract":"DataFrames.jl is a package written for and in the Julia language offering flexible and efficient handling of tabular data sets in memory. Thanks to Julia's unique strengths, it provides an appealing set of features: Rich support for standard data processing tasks and excellent flexibility and efficiency for more advanced and non-standard operations. We present the fundamental design of the package and how it compares with implementations of data frames in other languages, its main features, performance, and possible extensions. We conclude with a practical illustration of typical data processing operations.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135653276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling Population Growth in R with the biogrowth Package 用生物生长包在R中模拟种群增长
2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v107.i01
Alberto Garre, Jeroen Koomen, Heidy M. W. den Besten, Marcel H. Zwietering
The growth of populations is of interest in a broad variety of fields, such as epidemiology, economics or biology. Although a large variety of growth models are available in the scientific literature, their application usually requires advanced knowledge of mathematical programming and statistical inference, especially when modelling growth under dynamic environmental conditions. This article presents the biogrowth package for R, which implements functions for modelling the growth of populations. It can predict growth under static or dynamic environments, considering the effect of an arbitrary number of environmental factors. Moreover, it can be used to fit growth models to data gathered under static or dynamic environmental conditions. The package allows the user to fix any model parameter prior to the fit, an approach that can mitigate identifiability issues associated to growth models. The package includes common S3 methods for visualization and statistical analysis (summary of the fit, predictions, . . . ), easing result interpretation. It also includes functions for model comparison/selection. We illustrate the functions in biogrowth using examples from food science and economy.
{"title":"Modeling Population Growth in <i>R</i> with the <b>biogrowth</b> Package","authors":"Alberto Garre, Jeroen Koomen, Heidy M. W. den Besten, Marcel H. Zwietering","doi":"10.18637/jss.v107.i01","DOIUrl":"https://doi.org/10.18637/jss.v107.i01","url":null,"abstract":"The growth of populations is of interest in a broad variety of fields, such as epidemiology, economics or biology. Although a large variety of growth models are available in the scientific literature, their application usually requires advanced knowledge of mathematical programming and statistical inference, especially when modelling growth under dynamic environmental conditions. This article presents the biogrowth package for R, which implements functions for modelling the growth of populations. It can predict growth under static or dynamic environments, considering the effect of an arbitrary number of environmental factors. Moreover, it can be used to fit growth models to data gathered under static or dynamic environmental conditions. The package allows the user to fix any model parameter prior to the fit, an approach that can mitigate identifiability issues associated to growth models. The package includes common S3 methods for visualization and statistical analysis (summary of the fit, predictions, . . . ), easing result interpretation. It also includes functions for model comparison/selection. We illustrate the functions in biogrowth using examples from food science and economy.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135312289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
carat: An R Package for Covariate-Adaptive Randomization in Clinical Trials carat:临床试验中协变量自适应随机化的R包
2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v107.i02
Wei Ma, Xiaoqing Ye, Fuyi Tu, Feifang Hu
Covariate-adaptive randomization is gaining popularity in clinical trials because they enable the generation of balanced allocations with respect to covariates. Over the past decade, substantial progress has been made in both new innovative randomization procedures and the theoretical properties of associated inferences. However, these results are scattered across the literature, and a single tool kit does not exist for use by clinical trial practitioners and researchers to conduct and evaluate these methods. The R package carat is proposed to address this need. It facilitates a broad range of covariate-adaptive randomization and testing procedures, such as the most common and classical methods, and also reflects recent developments in the field. The package contains comprehensive evaluation and comparison tools for use in both randomization procedures and tests. This enables power analysis to be conducted to assist the planning of a covariate-adaptive clinical trial. The package also implements a command-line interface to allow for an interactive allocation procedure, which is typically the case in real-world applications. In this paper, the features and functionalities of carat are presented.
{"title":"<b>carat</b>: An <i>R</i> Package for Covariate-Adaptive Randomization in Clinical Trials","authors":"Wei Ma, Xiaoqing Ye, Fuyi Tu, Feifang Hu","doi":"10.18637/jss.v107.i02","DOIUrl":"https://doi.org/10.18637/jss.v107.i02","url":null,"abstract":"Covariate-adaptive randomization is gaining popularity in clinical trials because they enable the generation of balanced allocations with respect to covariates. Over the past decade, substantial progress has been made in both new innovative randomization procedures and the theoretical properties of associated inferences. However, these results are scattered across the literature, and a single tool kit does not exist for use by clinical trial practitioners and researchers to conduct and evaluate these methods. The R package carat is proposed to address this need. It facilitates a broad range of covariate-adaptive randomization and testing procedures, such as the most common and classical methods, and also reflects recent developments in the field. The package contains comprehensive evaluation and comparison tools for use in both randomization procedures and tests. This enables power analysis to be conducted to assist the planning of a covariate-adaptive clinical trial. The package also implements a command-line interface to allow for an interactive allocation procedure, which is typically the case in real-world applications. In this paper, the features and functionalities of carat are presented.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135312292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values 一个带有删减值和缺失值的条件图形Lasso推理的R包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v105.i01
L. Augugliaro, G. Sottile, E. C. Wit, V. Vinciotti
Sparse graphical models have revolutionized multivariate inference. With the advent of high-dimensional multivariate data in many applied fields, these methods are able to detect a much lower-dimensional structure, often represented via a sparse conditional independence graph. There have been numerous extensions of such methods in the past decade. Many practical applications have additional covariates or suffer from missing or censored data. Despite the development of these extensions of sparse inference methods for graphical models, there have been so far no implementations for, e.g., conditional graphical models. Here we present the general-purpose package cglasso for estimating sparse conditional Gaussian graphical models with potentially missing or censored data. The method employs an efficient expectation-maximization estimation of an l1-penalized likelihood via a block-coordinate descent algorithm. The package has a user-friendly data manipulation interface. It estimates a solution path and includes various automatic selection algorithms for the two l1 tuning parameters, associated with the sparse precision matrix and sparse regression coefficients, respectively. The package pays particular attention to the visualization of the results, both by means of marginal tables and figures, and of the inferred conditional independence graphs. This package provides a unique and computational efficient implementation of a conditional Gaussian graphical model that is able to deal with the additional complications of missing and censored data. As such it constitutes an important contribution for empirical scientists wishing to detect sparse structures in high-dimensional data.
稀疏图形模型已经彻底改变了多变量推理。随着高维多变量数据在许多应用领域的出现,这些方法能够检测到更低维的结构,通常通过稀疏条件独立图表示。在过去的十年中,这些方法有许多扩展。许多实际应用都有额外的协变量,或者数据丢失或被删减。尽管这些稀疏推理方法在图形模型上的扩展得到了发展,但到目前为止还没有针对条件图形模型的实现。在这里,我们提出了用于估计具有潜在缺失或删节数据的稀疏条件高斯图形模型的通用包类。该方法通过块坐标下降算法对11惩罚似然进行有效的期望最大化估计。这个包有一个用户友好的数据操作界面。它估计了一个解路径,并包括两个l1调优参数的各种自动选择算法,分别与稀疏精度矩阵和稀疏回归系数相关联。该软件包特别注意结果的可视化,既可以通过边缘表和图形,也可以通过推断的条件独立图。这个包提供了一个独特的、计算效率高的条件高斯图形模型实现,该模型能够处理丢失和删除数据的额外复杂性。因此,它构成了希望在高维数据中检测稀疏结构的经验科学家的重要贡献。
{"title":"cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values","authors":"L. Augugliaro, G. Sottile, E. C. Wit, V. Vinciotti","doi":"10.18637/jss.v105.i01","DOIUrl":"https://doi.org/10.18637/jss.v105.i01","url":null,"abstract":"Sparse graphical models have revolutionized multivariate inference. With the advent of high-dimensional multivariate data in many applied fields, these methods are able to detect a much lower-dimensional structure, often represented via a sparse conditional independence graph. There have been numerous extensions of such methods in the past decade. Many practical applications have additional covariates or suffer from missing or censored data. Despite the development of these extensions of sparse inference methods for graphical models, there have been so far no implementations for, e.g., conditional graphical models. Here we present the general-purpose package cglasso for estimating sparse conditional Gaussian graphical models with potentially missing or censored data. The method employs an efficient expectation-maximization estimation of an l1-penalized likelihood via a block-coordinate descent algorithm. The package has a user-friendly data manipulation interface. It estimates a solution path and includes various automatic selection algorithms for the two l1 tuning parameters, associated with the sparse precision matrix and sparse regression coefficients, respectively. The package pays particular attention to the visualization of the results, both by means of marginal tables and figures, and of the inferred conditional independence graphs. This package provides a unique and computational efficient implementation of a conditional Gaussian graphical model that is able to deal with the additional complications of missing and censored data. As such it constitutes an important contribution for empirical scientists wishing to detect sparse structures in high-dimensional data.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"68 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86751288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elastic Net Regularization Paths for All Generalized Linear Models. 所有广义线性模型的弹性网正则化路径。
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 Epub Date: 2023-03-23 DOI: 10.18637/jss.v106.i01
J Kenneth Tay, Balasubramanian Narasimhan, Trevor Hastie

The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models for right-censored data. We further extend the reach of the elastic net-regularized regression to all generalized linear model families, Cox models with (start, stop] data and strata, and a simplified version of the relaxed lasso. We also discuss convenient utility functions for measuring the performance of these fitted models.

套索和弹性网是用于监督学习的常用正则化回归模型。Friedman、Hastie和Tibshirani(2010)介绍了一种计算高效的算法,用于计算普通最小二乘回归、逻辑回归和多项式逻辑回归的弹性网正则化路径,而Simon、Friedman、Hastie和Tibshilani(2011)将这项工作扩展到右删失数据的Cox模型。我们进一步将弹性网正则化回归的范围扩展到所有广义线性模型族,具有(开始,停止)数据和地层的Cox模型,以及松弛套索的简化版本。我们还讨论了测量这些拟合模型性能的方便实用函数。
{"title":"Elastic Net Regularization Paths for All Generalized Linear Models.","authors":"J Kenneth Tay, Balasubramanian Narasimhan, Trevor Hastie","doi":"10.18637/jss.v106.i01","DOIUrl":"10.18637/jss.v106.i01","url":null,"abstract":"<p><p>The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models for right-censored data. We further extend the reach of the elastic net-regularized regression to all generalized linear model families, Cox models with (start, stop] data and strata, and a simplified version of the relaxed lasso. We also discuss convenient utility functions for measuring the performance of these fitted models.</p>","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"106 ","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153598/pdf/nihms-1843576.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9776933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Panel Data Visualization in R (panelView) and Stata (panelview) R (panelView)和Stata (panelView)中的面板数据可视化
2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v107.i07
Hongyu Mou, Licheng Liu, Yiqing Xu
We develop an R package panelView and a Stata package panelview for panel data visualization. They are designed to assist causal analysis with panel data and have three main functionalities: (1) They plot the treatment status and missing values in a panel dataset; (2) they visualize the temporal dynamics of the main variables of interest; and (3) they depict the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. These tools can help researchers better understand their panel datasets before conducting statistical analysis.
我们为面板数据可视化开发了一个R包panelView和一个Stata包panelView。它们旨在协助面板数据的因果分析,并具有三个主要功能:(1)它们绘制面板数据集中的治疗状态和缺失值;(2)可视化感兴趣的主要变量的时间动态;(3)它们描述了治疗变量和结果变量之间的二元关系,可以是单位关系,也可以是总体关系。这些工具可以帮助研究人员在进行统计分析之前更好地了解他们的面板数据集。
{"title":"Panel Data Visualization in <i>R</i> (<b>panelView</b>) and <i>Stata</i> (<b>panelview</b>)","authors":"Hongyu Mou, Licheng Liu, Yiqing Xu","doi":"10.18637/jss.v107.i07","DOIUrl":"https://doi.org/10.18637/jss.v107.i07","url":null,"abstract":"We develop an R package panelView and a Stata package panelview for panel data visualization. They are designed to assist causal analysis with panel data and have three main functionalities: (1) They plot the treatment status and missing values in a panel dataset; (2) they visualize the temporal dynamics of the main variables of interest; and (3) they depict the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. These tools can help researchers better understand their panel datasets before conducting statistical analysis.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135700470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GMM Estimators for Binary Spatial Models in R 二元空间模型的GMM估计
2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v107.i08
Gianfranco Piras, Mauricio Sarrias
Despite the huge availability of software to estimate cross-sectional spatial models, there are only few functions to estimate models dealing with spatial limited dependent variable. This paper fills this gap introducing the new R package spldv. The package is based on generalized methods of moment (GMM) estimators and includes a series of one- and two-step estimators based on different choices of the weighting matrix for the moments conditions in the first step, and different estimators for the variance-covariance matrix of the estimated coefficients. An important feature of spldv is that users can estimate the spatial Durbin model and compute the direct, indirect, and total effects in a friendly and flexible way.
尽管有大量可用的软件来估计横截面空间模型,但只有很少的函数来估计处理空间有限因变量的模型。本文介绍了新的R包spldv,填补了这一空白。该软件包基于广义矩估计方法,包括一系列基于第一步矩条件加权矩阵选择的一步和两步估计器,以及估计系数的方差-协方差矩阵的不同估计器。spldv的一个重要特点是用户可以友好灵活地估计空间Durbin模型,计算直接、间接和总效应。
{"title":"GMM Estimators for Binary Spatial Models in <i>R</i>","authors":"Gianfranco Piras, Mauricio Sarrias","doi":"10.18637/jss.v107.i08","DOIUrl":"https://doi.org/10.18637/jss.v107.i08","url":null,"abstract":"Despite the huge availability of software to estimate cross-sectional spatial models, there are only few functions to estimate models dealing with spatial limited dependent variable. This paper fills this gap introducing the new R package spldv. The package is based on generalized methods of moment (GMM) estimators and includes a series of one- and two-step estimators based on different choices of the weighting matrix for the moments conditions in the first step, and different estimators for the variance-covariance matrix of the estimated coefficients. An important feature of spldv is that users can estimate the spatial Durbin model and compute the direct, indirect, and total effects in a friendly and flexible way.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136259245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS 在Python和R: MIDASpy和rMIDAS中实现多种数据的高效多重输入
2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.18637/jss.v107.i09
Ranjit Lall, Thomas Robinson
This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.
本文介绍了在Python (MIDASpy)和R (rMIDAS)中使用深度学习方法有效地输入缺失数据的软件包。这些软件包实现了最近开发的一种称为MIDAS的多重输入方法,该方法包括在数据集中引入额外的缺失值,尝试使用一种称为去噪自动编码器的无监督神经网络重建这些值,并使用生成的模型绘制原始缺失数据的输入。这些步骤是由一个快速和灵活的算法来执行的,它扩大了数据的数量和范围,可以用多次插值来分析。为了帮助用户优化其特定应用的算法,MIDASpy和rMIDAS提供了大量用户友好的工具来校准和验证插补模型。我们提供了这些功能的详细指南,并演示了它们在大型真实数据集上的使用。
{"title":"Efficient Multiple Imputation for Diverse Data in <i>Python</i> and <i>R</i>: <b>MIDASpy</b> and <b>rMIDAS</b>","authors":"Ranjit Lall, Thomas Robinson","doi":"10.18637/jss.v107.i09","DOIUrl":"https://doi.org/10.18637/jss.v107.i09","url":null,"abstract":"This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136374652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
logitr: Fast Estimation of Multinomial and Mixed Logit Models with Preference Space and Willingness-to-Pay Space Utility Parameterizations Logit:基于偏好空间和支付意愿空间效用参数化的多项和混合Logit模型的快速估计
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-10-19 DOI: 10.18637/jss.v105.i10
J. Helveston
This paper introduces the logitr R package for fast maximum likelihood estimation of multinomial logit and mixed logit models with unobserved heterogeneity across individuals, which is modeled by allowing parameters to vary randomly over individuals according to a chosen distribution. The package is faster than other similar packages such as mlogit, gmnl, mixl, and apollo, and it supports utility models specified with"preference space"or"willingness to pay (WTP) space"parameterizations, allowing for the direct estimation of marginal WTP. The typical procedure of computing WTP post-estimation using a preference space model can lead to unreasonable distributions of WTP across the population in mixed logit models. The paper provides a discussion of some of the implications of each utility parameterization for WTP estimates. It also highlights some of the design features that enable logitr's performant estimation speed and includes a benchmarking exercise with similar packages. Finally, the paper highlights additional features that are designed specifically for WTP space models, including a consistent user interface for specifying models in either space and a parallelized multi-start optimization loop, which is particularly useful for searching the solution space for different local minima when estimating models with non-convex log-likelihood functions.
本文介绍了logitr R包,用于快速最大似然估计具有不可观测异质性的多项logit和混合logit模型,该模型通过允许参数根据选择的分布在个体上随机变化来建模。该软件包比其他类似的软件包(如mlogit、gmnl、mixl和apollo)要快,并且它支持用“偏好空间”或“支付意愿(WTP)空间”参数化指定的实用新型,允许直接估计边际WTP。在混合logit模型中,使用偏好空间模型计算WTP后估计的典型过程会导致WTP在总体中的不合理分布。本文讨论了WTP估计中每种效用参数化的一些含义。本文还重点介绍了支持logitr性能估计速度的一些设计特性,并包括使用类似包的基准测试练习。最后,本文强调了专门为WTP空间模型设计的附加功能,包括用于指定任意空间中的模型的一致用户界面和并行多启动优化循环,这对于在使用非凸对数似然函数估计模型时搜索不同局部极小值的解空间特别有用。
{"title":"logitr: Fast Estimation of Multinomial and Mixed Logit Models with Preference Space and Willingness-to-Pay Space Utility Parameterizations","authors":"J. Helveston","doi":"10.18637/jss.v105.i10","DOIUrl":"https://doi.org/10.18637/jss.v105.i10","url":null,"abstract":"This paper introduces the logitr R package for fast maximum likelihood estimation of multinomial logit and mixed logit models with unobserved heterogeneity across individuals, which is modeled by allowing parameters to vary randomly over individuals according to a chosen distribution. The package is faster than other similar packages such as mlogit, gmnl, mixl, and apollo, and it supports utility models specified with\"preference space\"or\"willingness to pay (WTP) space\"parameterizations, allowing for the direct estimation of marginal WTP. The typical procedure of computing WTP post-estimation using a preference space model can lead to unreasonable distributions of WTP across the population in mixed logit models. The paper provides a discussion of some of the implications of each utility parameterization for WTP estimates. It also highlights some of the design features that enable logitr's performant estimation speed and includes a benchmarking exercise with similar packages. Finally, the paper highlights additional features that are designed specifically for WTP space models, including a consistent user interface for specifying models in either space and a parallelized multi-start optimization loop, which is particularly useful for searching the solution space for different local minima when estimating models with non-convex log-likelihood functions.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"15 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79549247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Journal of Statistical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1