首页 > 最新文献

Journal of Statistical Software最新文献

英文 中文
Python and R for the Modern Data Scientist 面向现代数据科学家的Python和R
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.b02
C. Lortie
Computation in many fields including those that use statistical software is increasingly driven by needs that can be addressed in many programming ecosystems. In projects that require statistical analyses, both R and Python comprise two frequent resources. In ecology, R is the most frequently used (Lai, Lortie, Muenchen, Yang, and Ma 2019). In bioinformatic gene set analyses, R is also more frequently used in peer-reviewed publications, but Python is still an important statistical resource depending on the specific project (Xie, Jauhari, and Mora 2021). Python outcompetes other languages in use for machine learning and some forms of factor analyses (Hao and Ho 2019; Persson and Khojasteh 2021; Raschka, Patterson, and Nolet 2020). However, the relative frequency that a tool is used for statistical analyses is only one metric of importance and not necessarily a proxy for its merit or its capacity to support innovation and efficient in analyses for practitioners (Zhao, Yan, and Li 2018). It is thus critical that we explore contrasts of at least these two common software languages that support statistics because data scientists can become isolated or polarized within their specific competencies, ideologies, and workflows. A high-level discussion of strengths and weaknesses specific to data endeavors with statistics is germane to both decisions on specific projects and on competency development as a scientist.
包括使用统计软件在内的许多领域的计算越来越多地受到许多编程生态系统中可以解决的需求的驱动。在需要统计分析的项目中,R和Python都包含两种常用资源。在生态学中,R是最常用的(Lai, Lortie, Muenchen, Yang, and Ma 2019)。在生物信息学基因集分析中,R也更频繁地用于同行评审的出版物,但根据具体项目,Python仍然是重要的统计资源(Xie, Jauhari, and Mora 2021)。Python在机器学习和某些形式的因素分析方面胜过其他语言(Hao and Ho 2019;Persson and Khojasteh 2021;Raschka, Patterson, and Nolet 2020)。然而,工具用于统计分析的相对频率只是一个重要的度量标准,并不一定代表其优点或支持从业者创新和高效分析的能力(Zhao, Yan, and Li, 2018)。因此,我们探索至少这两种支持统计的常见软件语言的对比是至关重要的,因为数据科学家可能在他们的特定能力、意识形态和工作流程中变得孤立或两极分化。对统计数据的优缺点进行高层次的讨论,对具体项目的决策和作为科学家的能力发展都是有密切关系的。
{"title":"Python and R for the Modern Data Scientist","authors":"C. Lortie","doi":"10.18637/jss.v103.b02","DOIUrl":"https://doi.org/10.18637/jss.v103.b02","url":null,"abstract":"Computation in many fields including those that use statistical software is increasingly driven by needs that can be addressed in many programming ecosystems. In projects that require statistical analyses, both R and Python comprise two frequent resources. In ecology, R is the most frequently used (Lai, Lortie, Muenchen, Yang, and Ma 2019). In bioinformatic gene set analyses, R is also more frequently used in peer-reviewed publications, but Python is still an important statistical resource depending on the specific project (Xie, Jauhari, and Mora 2021). Python outcompetes other languages in use for machine learning and some forms of factor analyses (Hao and Ho 2019; Persson and Khojasteh 2021; Raschka, Patterson, and Nolet 2020). However, the relative frequency that a tool is used for statistical analyses is only one metric of importance and not necessarily a proxy for its merit or its capacity to support innovation and efficient in analyses for practitioners (Zhao, Yan, and Li 2018). It is thus critical that we explore contrasts of at least these two common software languages that support statistics because data scientists can become isolated or polarized within their specific competencies, ideologies, and workflows. A high-level discussion of strengths and weaknesses specific to data endeavors with statistics is germane to both decisions on specific projects and on competency development as a scientist.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"52 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72785050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ParMA: Parallelized Bayesian Model Averaging for Generalized Linear Models 广义线性模型的并行化贝叶斯模型平均
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v104.i02
R. Lucchetti, Luca Pedini
This paper describes the gretl function package ParMA , which provides Bayesian model averaging (BMA) in generalized linear models. In order to overcome the lack of analytical specification for many of the models covered, the package features an implementation of the reversible jump Markov chain Monte Carlo technique, following the original idea by Green (1995), as a flexible tool to model several specifications. Particular attention is devoted to computational aspects such as the automatization of the model building procedure and the parallelization of the sampling scheme.
本文介绍了在广义线性模型中提供贝叶斯模型平均(BMA)的gretl函数包ParMA。为了克服所涵盖的许多模型缺乏分析规范,该软件包的特点是实现了可逆跳跃马尔可夫链蒙特卡罗技术,遵循了Green(1995)的原始想法,作为对几种规范建模的灵活工具。特别注意的是计算方面,如模型建立过程的自动化和抽样方案的并行化。
{"title":"ParMA: Parallelized Bayesian Model Averaging for Generalized Linear Models","authors":"R. Lucchetti, Luca Pedini","doi":"10.18637/jss.v104.i02","DOIUrl":"https://doi.org/10.18637/jss.v104.i02","url":null,"abstract":"This paper describes the gretl function package ParMA , which provides Bayesian model averaging (BMA) in generalized linear models. In order to overcome the lack of analytical specification for many of the models covered, the package features an implementation of the reversible jump Markov chain Monte Carlo technique, following the original idea by Green (1995), as a flexible tool to model several specifications. Particular attention is devoted to computational aspects such as the automatization of the model building procedure and the parallelization of the sampling scheme.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"104 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87323845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
synthACS: Spatial Microsimulation Modeling with Synthetic American Community Survey Data synthACS:空间微模拟建模与综合美国社区调查数据
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v104.i07
Alex P. Whitworth
synthACS is an R package that provides flexible tools for building synthetic micro-datasets based on American Community Survey (ACS) base tables, allows data-extensibility and enables to conduct spatial microsimulation modeling (SMSM) via simulated annealing. To our knowledge, it is the first R package to provide broadly applicable tools for SMSM with ACS data as well as the first SMSM implementation that uses unequal probability sampling in the simulated annealing algorithm. In this paper, we contextualize these developments within the SMSM literature, provide a hands-on user-guide to package synthACS , present a case study of SMSM related to population dynamics, and note areas for future research.
synthACS是一个R包,它提供了基于美国社区调查(ACS)基表构建合成微数据集的灵活工具,允许数据可扩展性,并能够通过模拟退火进行空间微模拟建模(SMSM)。据我们所知,它是第一个为具有ACS数据的SMSM提供广泛适用工具的R包,也是第一个在模拟退火算法中使用不等概率抽样的SMSM实现。在本文中,我们在SMSM文献中对这些发展进行了背景介绍,提供了包synthACS的动手用户指南,提出了与人口动态相关的SMSM案例研究,并指出了未来研究的领域。
{"title":"synthACS: Spatial Microsimulation Modeling with Synthetic American Community Survey Data","authors":"Alex P. Whitworth","doi":"10.18637/jss.v104.i07","DOIUrl":"https://doi.org/10.18637/jss.v104.i07","url":null,"abstract":"synthACS is an R package that provides flexible tools for building synthetic micro-datasets based on American Community Survey (ACS) base tables, allows data-extensibility and enables to conduct spatial microsimulation modeling (SMSM) via simulated annealing. To our knowledge, it is the first R package to provide broadly applicable tools for SMSM with ACS data as well as the first SMSM implementation that uses unequal probability sampling in the simulated annealing algorithm. In this paper, we contextualize these developments within the SMSM literature, provide a hands-on user-guide to package synthACS , present a case study of SMSM related to population dynamics, and note areas for future research.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"100 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76993187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
rags2ridges: A One-Stop- ℓ2 -Shop for rags2ridge:一站式商店
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v102.i04
Carel F. W. Peeters, A. E. Bilgrau, W. V. van Wieringen
{"title":"rags2ridges: A One-Stop- ℓ2 -Shop for ","authors":"Carel F. W. Peeters, A. E. Bilgrau, W. V. van Wieringen","doi":"10.18637/jss.v102.i04","DOIUrl":"https://doi.org/10.18637/jss.v102.i04","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67679193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
[RETRACTED ARTICLE] irtplay: An R Package for Unidimensional Item Response Theory Modeling [文章摘自]irtplay:一个面向单项反应理论建模的R包
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.i12
Hwanggyu Lim, C. Wells
{"title":"[RETRACTED ARTICLE] irtplay: An R Package for Unidimensional Item Response Theory Modeling","authors":"Hwanggyu Lim, C. Wells","doi":"10.18637/jss.v103.i12","DOIUrl":"https://doi.org/10.18637/jss.v103.i12","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"215 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75684716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Doing Meta-Analysis with R - A Hands-On Guide 用R做元分析-实践指南
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v102.b02
C. Lortie
Scientific synthesis is a diverse field of contemporary science. Syntheses advance knowledge in many domains and can include data compilation, theory syntheses, methods contrasts, and systematic reviews with meta-analyses through an integrated and big-picture view of evidence (Halpern et al. 2020). All these knowledge tools are typically strongly supported by statistical software including the open-source programming language R. Within this environment, there are nearly 100 packages to support meta-analyses each with different functions and specific capabilities (Lortie and Filazzola 2020). Meta-analyses are defined in most domains as the calculation of effect sizes or a weighted relative strength of evidence from a set of studies or trials to then subsequently examine high-level statistical patterns and variance (Gurevitch, Koricheva, Nakagawa, and Stewart 2018). They are increasingly used in many fields of science to examine consilience in hypotheses (Lortie 2014) and have been proposed as the gold or even platinum standard of evidence when there is statistical agreement in the efficacy of an intervention across studies (Stegenga 2011). Consequently, there is a critical need for accessible, pragmatic publications, resources, and texts that enable scientists with varying levels of expertise to engage in scientific syntheses using meta-analysis.
科学综合是当代科学的一个多元化领域。综合促进了许多领域的知识,可以包括数据汇编、理论综合、方法对比,以及通过综合和大视角的证据进行meta分析的系统综述(Halpern et al. 2020)。所有这些知识工具通常都得到统计软件的大力支持,包括开源编程语言r。在这个环境中,有近100个软件包支持元分析,每个软件包具有不同的功能和特定功能(Lortie和Filazzola 2020)。在大多数领域,荟萃分析被定义为计算效应大小或来自一组研究或试验的加权证据的相对强度,然后检查高级统计模式和方差(Gurevitch, Koricheva, Nakagawa, and Stewart 2018)。它们越来越多地用于许多科学领域,以检查假设的一致性(Lortie 2014),并且当在研究中干预的有效性存在统计一致性时,已被提议作为金甚至白金证据标准(Stegenga 2011)。因此,迫切需要可访问的、实用的出版物、资源和文本,使具有不同专业水平的科学家能够使用元分析进行科学综合。
{"title":"Doing Meta-Analysis with R - A Hands-On Guide","authors":"C. Lortie","doi":"10.18637/jss.v102.b02","DOIUrl":"https://doi.org/10.18637/jss.v102.b02","url":null,"abstract":"Scientific synthesis is a diverse field of contemporary science. Syntheses advance knowledge in many domains and can include data compilation, theory syntheses, methods contrasts, and systematic reviews with meta-analyses through an integrated and big-picture view of evidence (Halpern et al. 2020). All these knowledge tools are typically strongly supported by statistical software including the open-source programming language R. Within this environment, there are nearly 100 packages to support meta-analyses each with different functions and specific capabilities (Lortie and Filazzola 2020). Meta-analyses are defined in most domains as the calculation of effect sizes or a weighted relative strength of evidence from a set of studies or trials to then subsequently examine high-level statistical patterns and variance (Gurevitch, Koricheva, Nakagawa, and Stewart 2018). They are increasingly used in many fields of science to examine consilience in hypotheses (Lortie 2014) and have been proposed as the gold or even platinum standard of evidence when there is statistical agreement in the efficacy of an intervention across studies (Stegenga 2011). Consequently, there is a critical need for accessible, pragmatic publications, resources, and texts that enable scientists with varying levels of expertise to engage in scientific syntheses using meta-analysis.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"39 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77664859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 622
plot3logit: Ternary Plots for Interpreting Trinomial Regression Models plot3logit:解释三项式回归模型的三元图
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.c01
F. Santi, M. M. Dickson, G. Espa, D. Giuliani
This paper presents the R package plot3logit which enables the covariate effects of trinomial regression models to be represented graphically by means of a ternary plot. The aim of the plot is helping the interpretation of regression coefficients in terms of the effects that a change in values of regressors has on the probability distribution of the dependent variable. Such changes may involve either a single regressor, or a group of them (composite changes), and the package permits both cases to be handled in a user-friendly way. Moreover, plot3logit can compute and draw confidence regions of the effects of covariate changes and enables multiple changes and profiles to be represented and compared jointly. Upstream and downstream compatibility makes the package able to work with other R packages or applications other than R .
本文提出了R包plot3logit,它使三叉回归模型的协变量效应可以用三叉图来表示。该图的目的是根据回归量值的变化对因变量概率分布的影响来帮助解释回归系数。这样的更改可能涉及单个回归量,也可能涉及一组回归量(复合更改),并且该包允许以用户友好的方式处理这两种情况。此外,plot3logit可以计算和绘制协变量变化影响的置信区域,使多个变化和剖面能够共同表示和比较。上游和下游的兼容性使得该包能够与R以外的其他R包或应用程序一起工作。
{"title":"plot3logit: Ternary Plots for Interpreting Trinomial Regression Models","authors":"F. Santi, M. M. Dickson, G. Espa, D. Giuliani","doi":"10.18637/jss.v103.c01","DOIUrl":"https://doi.org/10.18637/jss.v103.c01","url":null,"abstract":"This paper presents the R package plot3logit which enables the covariate effects of trinomial regression models to be represented graphically by means of a ternary plot. The aim of the plot is helping the interpretation of regression coefficients in terms of the effects that a change in values of regressors has on the probability distribution of the dependent variable. Such changes may involve either a single regressor, or a group of them (composite changes), and the package permits both cases to be handled in a user-friendly way. Moreover, plot3logit can compute and draw confidence regions of the effects of covariate changes and enables multiple changes and profiles to be represented and compared jointly. Upstream and downstream compatibility makes the package able to work with other R packages or applications other than R .","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"382 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82501290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Identification and Forecasting of Structural Unobserved Components Models with UComp 基于UComp的结构未观测构件模型的自动识别与预测
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.i09
D. J. Pedregal
UComp is a powerful library for building unobserved components models, useful for forecasting and other important operations, such us de-trending, cycle analysis, seasonal adjustment, signal extraction, etc. One of the most outstanding features that makes UComp unique among its class of related software implementations is that models may be built automatically by identification algorithms (three versions are available). These algorithms select the best model among many possible combinations. Another relevant feature is that it is coded in C++ , opening the door to link it to different popular and widely used environments, like R , MATLAB , Octave , Python , etc. The implemented models for the components are more general than the usual ones in the field of unobserved components modeling, including different types of trend, cycle, seasonal and irregular components, input variables and outlier detection. The automatic character of the algorithms required the development of many complementary algorithms to control performance and make it applicable to as many different time series as possible. The library is open source and available in different formats in public repositories. The performance of the library is illustrated working on real data in several varied examples.
UComp是一个功能强大的库,用于构建未观察组件模型,用于预测和其他重要操作,如去趋势,周期分析,季节调整,信号提取等。使UComp在同类相关软件实现中独树一帜的最突出的特性之一是可以通过识别算法自动构建模型(有三个版本可用)。这些算法从许多可能的组合中选择最佳模型。另一个相关的特性是它是用c++编写的,这打开了将它与不同流行和广泛使用的环境(如R, MATLAB, Octave, Python等)联系起来的大门。所实现的组件模型比一般的非观测组件建模模型更为通用,包括不同类型的趋势、周期、季节和不规则组件、输入变量和离群值检测。算法的自动特性要求开发许多互补算法来控制性能,并使其适用于尽可能多的不同时间序列。该库是开源的,可以在公共存储库中以不同的格式获得。通过几个不同的示例说明了该库在处理实际数据时的性能。
{"title":"Automatic Identification and Forecasting of Structural Unobserved Components Models with UComp","authors":"D. J. Pedregal","doi":"10.18637/jss.v103.i09","DOIUrl":"https://doi.org/10.18637/jss.v103.i09","url":null,"abstract":"UComp is a powerful library for building unobserved components models, useful for forecasting and other important operations, such us de-trending, cycle analysis, seasonal adjustment, signal extraction, etc. One of the most outstanding features that makes UComp unique among its class of related software implementations is that models may be built automatically by identification algorithms (three versions are available). These algorithms select the best model among many possible combinations. Another relevant feature is that it is coded in C++ , opening the door to link it to different popular and widely used environments, like R , MATLAB , Octave , Python , etc. The implemented models for the components are more general than the usual ones in the field of unobserved components modeling, including different types of trend, cycle, seasonal and irregular components, input variables and outlier detection. The automatic character of the algorithms required the development of many complementary algorithms to control performance and make it applicable to as many different time series as possible. The library is open source and available in different formats in public repositories. The performance of the library is illustrated working on real data in several varied examples.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"77 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88563192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regularized Ordinal Regression and the ordinalNet R Package. 正则化正则回归和 ordinalNet R 软件包。
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-09-01 DOI: 10.18637/jss.v099.i06
Michael J Wurm, Paul J Rathouz, Bret M Hanlon

Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the elementwise link multinomial-ordinal (ELMO) class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class.

正则化技术,如 lasso(Tibshirani,1996 年)和 elastic net(Zou 和 Hastie,2005 年),可用于提高回归模型的系数估计和预测准确性,以及进行变量选择。正则回归模型在应用中被广泛使用,正则化的使用可能会带来益处;然而,许多流行的正则化回归软件包并不包含这些模型。我们提出了一种坐标下降算法,用于拟合一大类带有弹性网惩罚的序数回归模型。此外,我们还证明了该类模型中的每个模型都可以推广到一种更灵活的形式,既可以用于有序分类数据建模,也可以用于无序分类响应数据建模。我们将其称为元素链接多叉-序数(ELMO)类,它包括广泛使用的模型,如多叉逻辑回归(也有序数形式)和序数逻辑回归(也有无序多叉形式)。我们介绍了一种适用于任一模型形式的弹性净惩罚类,此外,这种惩罚还可用于将非顺序模型缩减为顺序模型。最后,我们介绍了 R 软件包 ordinalNet,它实现了该模型类的算法。
{"title":"Regularized Ordinal Regression and the ordinalNet R Package.","authors":"Michael J Wurm, Paul J Rathouz, Bret M Hanlon","doi":"10.18637/jss.v099.i06","DOIUrl":"10.18637/jss.v099.i06","url":null,"abstract":"<p><p>Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the <i>elementwise link multinomial-ordinal</i> (ELMO) class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package <b>ordinalNet</b>, which implements the algorithm for this model class.</p>","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"99 6","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8432594/pdf/nihms-1018361.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39408264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Analysis of Sample Selection Models through the R Package ssmrob 基于R Package的样本选择模型鲁棒性分析
IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-08-21 DOI: 10.18637/jss.v099.i04
Mikhail Zhelonkin, E. Ronchetti
The aim of this paper is to describe the implementation and to provide a tutorial for the R package ssmrob, which is developed for robust estimation and inference in sample selection and endogenous treatment models. The sample selectivity issue occurs in practice in various fields, when a non-random sample of a population is observed, i.e., when observations are present according to some selection rule. It is well known that the classical estimators introduced by Heckman (1979) are very sensitive to small deviations from the distributional assumptions (typically the normality assumption on the error terms). Zhelonkin, Genton, and Ronchetti (2016) investigated the robustness properties of these estimators and proposed robust alternatives to the estimator and the corresponding test. We briefly discuss the robust approach and demonstrate its performance in practice by providing several empirical examples. The package can be used both to produce a complete robust statistical analysis of these models which complements the classical one and as a set of useful tools for exploratory data analysis. Specifically, robust estimators and standard errors of the coefficients of both the selection and the regression equations are provided together with a robust test of selectivity. The package therefore provides additional useful information to practitioners in different fields of applications by enhancing their statistical analysis of these models.
本文的目的是描述实现并为R包ssmrob提供教程,ssmrob是为样本选择和内生处理模型中的鲁棒估计和推理而开发的。样本选择性问题发生在各个领域的实践中,当观察到一个群体的非随机样本时,即当观察结果根据某些选择规则存在时。众所周知,Heckman(1979)引入的经典估计量对偏离分布假设(通常是误差项的正态性假设)的小偏差非常敏感。Zhelonkin, Genton和Ronchetti(2016)研究了这些估计器的鲁棒性,并提出了估计器和相应测试的鲁棒性替代方案。我们简要地讨论了鲁棒方法,并通过提供几个经验例子来证明其在实践中的性能。该软件包既可用于生成这些模型的完整稳健统计分析,补充了经典模型,也可作为探索性数据分析的一组有用工具。具体地说,给出了选择方程和回归方程系数的鲁棒估计和标准误差,以及对选择性的鲁棒检验。因此,该包通过加强对这些模型的统计分析,为不同应用领域的从业者提供了额外的有用信息。
{"title":"Robust Analysis of Sample Selection Models through the R Package ssmrob","authors":"Mikhail Zhelonkin, E. Ronchetti","doi":"10.18637/jss.v099.i04","DOIUrl":"https://doi.org/10.18637/jss.v099.i04","url":null,"abstract":"The aim of this paper is to describe the implementation and to provide a tutorial for the R package ssmrob, which is developed for robust estimation and inference in sample selection and endogenous treatment models. The sample selectivity issue occurs in practice in various fields, when a non-random sample of a population is observed, i.e., when observations are present according to some selection rule. It is well known that the classical estimators introduced by Heckman (1979) are very sensitive to small deviations from the distributional assumptions (typically the normality assumption on the error terms). Zhelonkin, Genton, and Ronchetti (2016) investigated the robustness properties of these estimators and proposed robust alternatives to the estimator and the corresponding test. We briefly discuss the robust approach and demonstrate its performance in practice by providing several empirical examples. The package can be used both to produce a complete robust statistical analysis of these models which complements the classical one and as a set of useful tools for exploratory data analysis. Specifically, robust estimators and standard errors of the coefficients of both the selection and the regression equations are provided together with a robust test of selectivity. The package therefore provides additional useful information to practitioners in different fields of applications by enhancing their statistical analysis of these models.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"40 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90273129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of Statistical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1