Journal of Statistical Software最新文献

英文中文

Robust Mediation Analysis: The R Package robmed 稳健中介分析:R包

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-02-24 DOI: 10.18637/jss.v103.i13

A. Alfons, N. Ateş, P. Groenen

Mediation analysis is one of the most widely used statistical techniques in the social, behavioral, and medical sciences. Mediation models allow to study how an independent variable affects a dependent variable indirectly through one or more intervening variables, which are called mediators. The analysis is often carried out via a series of linear regressions, in which case the indirect effects can be computed as products of coefficients from those regressions. Statistical significance of the indirect effects is typically assessed via a bootstrap test based on ordinary least-squares estimates. However, this test is sensitive to outliers or other deviations from normality assumptions, which poses a serious threat to empirical testing of theory about mediation mechanisms. The R package robmed implements a robust procedure for mediation analysis based on the fast-and-robust bootstrap methodology for robust regression estimators, which yields reliable results even when the data deviate from the usual normality assumptions. Various other procedures for mediation analysis are included in package robmed as well. Moreover, robmed introduces a new formula interface that allows to specify mediation models with a single formula, and provides various plots for diagnostics or visual representation of the results.

调解分析是在社会、行为和医学科学中使用最广泛的统计技术之一。中介模型允许研究自变量如何通过一个或多个中介变量间接影响因变量，这些中介变量被称为中介。分析通常通过一系列线性回归进行，在这种情况下，间接影响可以计算为这些回归系数的乘积。间接效应的统计显著性通常通过基于普通最小二乘估计的自举检验来评估。然而，该测试对异常值或偏离正态假设的其他偏差很敏感，这对调解机制理论的实证检验构成了严重威胁。R包实现了一个健壮的中介分析过程，该过程基于健壮回归估计器的快速健壮的自举方法，即使在数据偏离通常的正态性假设时也会产生可靠的结果。包中还包括用于中介分析的各种其他程序。此外，robmed引入了一个新的公式接口，该接口允许使用单个公式指定中介模型，并提供用于诊断的各种图或结果的可视化表示。

{"title":"Robust Mediation Analysis: The R Package robmed","authors":"A. Alfons, N. Ateş, P. Groenen","doi":"10.18637/jss.v103.i13","DOIUrl":"https://doi.org/10.18637/jss.v103.i13","url":null,"abstract":"Mediation analysis is one of the most widely used statistical techniques in the social, behavioral, and medical sciences. Mediation models allow to study how an independent variable affects a dependent variable indirectly through one or more intervening variables, which are called mediators. The analysis is often carried out via a series of linear regressions, in which case the indirect effects can be computed as products of coefficients from those regressions. Statistical significance of the indirect effects is typically assessed via a bootstrap test based on ordinary least-squares estimates. However, this test is sensitive to outliers or other deviations from normality assumptions, which poses a serious threat to empirical testing of theory about mediation mechanisms. The R package robmed implements a robust procedure for mediation analysis based on the fast-and-robust bootstrap methodology for robust regression estimators, which yields reliable results even when the data deviate from the usual normality assumptions. Various other procedures for mediation analysis are included in package robmed as well. Moreover, robmed introduces a new formula interface that allows to specify mediation models with a single formula, and provides various plots for diagnostics or visual representation of the results.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"40 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74051967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Hierarchical Clustering with Contiguity Constraint in R 具有邻近约束的分层聚类

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.i07

G. Guénard, P. Legendre

This article presents a new implementation of hierarchical clustering for the R language that allows one to apply spatial or temporal contiguity constraints during the clustering process. The need for contiguity constraint arises, for instance, when one wants to partition a map into different domains of similar physical conditions, identify discontinuities in time series, group regional administrative units with respect to their performance, and so on. To increase computation efficiency, we programmed the core functions in plain C . The result is a new R function, constr.hclust , which is distributed in package adespatial . The program implements the general agglomerative hierarchical clustering algorithm described by Lance and Williams (1966; 1967), with the particularity of allowing only clusters that are contiguous in geographic space or along time to fuse at any given step. Contiguity can be defined with respect to space or time. Information about spatial contiguity is provided by a connection network among sites, with edges describing the links between connected sites. Clustering with a temporal contiguity constraint is also known as chronological clustering. Information on temporal contiguity can be implicitly provided as the rank positions of observations in the time series. The implementation was mirrored on that found in the hierarchical clustering function hclust of the standard R package stats ( R Core Team 2022). We transcribed that function from Fortran to C and added the functionality to apply constraints when running the function. The implementation is efficient. It is limited mainly by input/output access as massive amounts of memory are potentially needed to store copies of the dissimilarity matrix and update its elements when analyzing large problems. We provided R computer code for plotting results for numbers of clusters.

本文为R语言提供了一种新的分层聚类实现，它允许在聚类过程中应用空间或时间上的连续性约束。例如，当需要将地图划分为具有相似物理条件的不同域、识别时间序列中的不连续性、根据其性能对区域管理单元进行分组等等时，就会出现对连续性约束的需求。为了提高计算效率，我们用C语言编写了核心函数。结果是一个新的R函数，constr。Hclust，它分布在包空间中。该程序实现了Lance和Williams (1966;1967)，其特点是只允许在地理空间或时间上连续的集群在任何给定的步骤上融合。连续性可以根据空间或时间来定义。关于空间连续性的信息由站点之间的连接网络提供，边缘描述了连接站点之间的链接。具有时间连续性约束的聚类也称为时间聚类。时间连续性的信息可以隐式地作为观测值在时间序列中的秩位置提供。该实现是基于标准R包统计(R Core Team 2022)的分层聚类功能hclust中的实现的。我们将该函数从Fortran转录到C，并添加了在运行函数时应用约束的功能。实现是高效的。它主要受到输入/输出访问的限制，因为在分析大型问题时，可能需要大量内存来存储不同矩阵的副本并更新其元素。我们提供了R计算机代码来绘制集群数量的结果。

{"title":"Hierarchical Clustering with Contiguity Constraint in R","authors":"G. Guénard, P. Legendre","doi":"10.18637/jss.v103.i07","DOIUrl":"https://doi.org/10.18637/jss.v103.i07","url":null,"abstract":"This article presents a new implementation of hierarchical clustering for the R language that allows one to apply spatial or temporal contiguity constraints during the clustering process. The need for contiguity constraint arises, for instance, when one wants to partition a map into different domains of similar physical conditions, identify discontinuities in time series, group regional administrative units with respect to their performance, and so on. To increase computation efficiency, we programmed the core functions in plain C . The result is a new R function, constr.hclust , which is distributed in package adespatial . The program implements the general agglomerative hierarchical clustering algorithm described by Lance and Williams (1966; 1967), with the particularity of allowing only clusters that are contiguous in geographic space or along time to fuse at any given step. Contiguity can be defined with respect to space or time. Information about spatial contiguity is provided by a connection network among sites, with edges describing the links between connected sites. Clustering with a temporal contiguity constraint is also known as chronological clustering. Information on temporal contiguity can be implicitly provided as the rank positions of observations in the time series. The implementation was mirrored on that found in the hierarchical clustering function hclust of the standard R package stats ( R Core Team 2022). We transcribed that function from Fortran to C and added the functionality to apply constraints when running the function. The implementation is efficient. It is limited mainly by input/output access as massive amounts of memory are potentially needed to store copies of the dissimilarity matrix and update its elements when analyzing large problems. We provided R computer code for plotting results for numbers of clusters.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"2013 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87740464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

stringi: Fast and Portable Character String Processing in R 在R中快速和可移植的字符串处理

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.i02

M. Gagolewski

Effective processing of character strings is required at various stages of data analysis pipelines: from data cleansing and preparation, through information extraction, to report generation. Pattern searching, string collation and sorting, normalization, transliteration, and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package for fast and portable handling of string data based on ICU (International Components for Unicode), should be included in each statistician’s or data scientist’s repertoire to complement their numerical computing and data wrangling skills.

在数据分析管道的各个阶段都需要对字符串进行有效的处理:从数据清理和准备，到信息提取，再到报告生成。模式搜索、字符串整理和排序、规范化、音译和格式化在文本挖掘、自然语言处理和生物信息学中无处不在。本文讨论并演示了如何以及为什么应该将stringi(一个成熟的R包，用于基于ICU (Unicode国际组件)的快速和可移植的字符串数据处理)包含在每个统计学家或数据科学家的技能库中，以补充他们的数值计算和数据整理技能。

引用次数: 26

Learning Base R (2nd Edition) 学习基础R(第二版)

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.b01

James E. Helmreich

引用次数: 0

bbl: Boltzmann Bayes Learner for High-Dimensional Inference with Discrete Predictors in R Boltzmann Bayes学习器在R中的离散预测的高维推理

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v101.i05

J. Woo, Jinhua Wang

引用次数: 0

TransModel: An R Package for Linear Transformation Model with Censored Data TransModel:一个带有删节数据的线性变换模型的R包

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v101.i09

Jie Zhou, Jiajia Zhang, Wenbin Lu

Linear transformation models, including the proportional hazards model and proportional odds model, under right censoring were discussed by Chen, Jin, and Ying (2002). The asymptotic variance of the estimator they proposed has a closed form and can be obtained easily by plug-in rules, which improves the computational efficiency. We develop an R package TransModel based on Chen’s approach. The detailed usage of the package is discussed, and the function is applied to the Veterans’ Administration lung cancer data.

Chen, Jin, and Ying(2002)讨论了右审查下的线性变换模型，包括比例风险模型和比例几率模型。他们所提出的估计量的渐近方差具有封闭的形式，可以很容易地通过插件规则得到，从而提高了计算效率。我们基于Chen的方法开发了一个R包TransModel。讨论了该软件包的详细用法，并将该功能应用于退伍军人管理局肺癌数据。

引用次数: 1

Spbsampling: An R Package for Spatially Balanced Sampling Spbsampling:一个空间平衡采样的R包

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.c02

Francesco Pantalone, R. Benedetti, Federica Pierismoni

The basic idea underpinning the theory of spatially balanced sampling is that units closer to each other provide less information about a target of inference than units farther apart. Therefore, it should be desirable to select a sample well spread over the population of interest, or a spatially balanced sample . This situation is easily understood in, among many others, environmental, geological, biological, and agricultural surveys, where usually the main feature of the population is to be geo-referenced. Since traditional sampling designs generally do not exploit the spatial features and since it is desirable to take into account the information regarding spatial dependence, several sampling designs have been developed in order to achieve this objective. In this paper, we present the R package Spbsampling , which provides functions in order to perform three specific sampling designs that pursue the aforementioned purpose. In particular, these sampling designs achieve spatially balanced samples using a summary index of the distance matrix. In this sense, the applicability of the package is much wider, as a distance matrix can be defined for units according to variables different than geographical coordinates.

支撑空间平衡抽样理论的基本思想是，距离较近的单位比距离较远的单位提供的关于推断目标的信息较少。因此，最好是选择一个分布在总体上的样本，或者一个空间平衡的样本。这种情况在环境、地质、生物和农业调查中很容易理解，在这些调查中，人口的主要特征通常是地理参考。由于传统的抽样设计通常不利用空间特征，并且由于考虑到有关空间依赖性的信息是可取的，为了实现这一目标，已经开发了几种抽样设计。在本文中，我们介绍了R包Spbsampling，它提供了执行三种特定采样设计的功能，以实现上述目的。特别是，这些采样设计使用距离矩阵的汇总索引来实现空间平衡样本。从这个意义上说，包的适用性要广泛得多，因为距离矩阵可以根据不同于地理坐标的变量来定义单位。

{"title":"Spbsampling: An R Package for Spatially Balanced Sampling","authors":"Francesco Pantalone, R. Benedetti, Federica Pierismoni","doi":"10.18637/jss.v103.c02","DOIUrl":"https://doi.org/10.18637/jss.v103.c02","url":null,"abstract":"The basic idea underpinning the theory of spatially balanced sampling is that units closer to each other provide less information about a target of inference than units farther apart. Therefore, it should be desirable to select a sample well spread over the population of interest, or a spatially balanced sample . This situation is easily understood in, among many others, environmental, geological, biological, and agricultural surveys, where usually the main feature of the population is to be geo-referenced. Since traditional sampling designs generally do not exploit the spatial features and since it is desirable to take into account the information regarding spatial dependence, several sampling designs have been developed in order to achieve this objective. In this paper, we present the R package Spbsampling , which provides functions in order to perform three specific sampling designs that pursue the aforementioned purpose. In particular, these sampling designs achieve spatially balanced samples using a summary index of the distance matrix. In this sense, the applicability of the package is much wider, as a distance matrix can be defined for units according to variables different than geographical coordinates.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"26 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80110451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Feller-Pareto and Related Distributions: Numerical Implementation and Actuarial Applications 费勒-帕累托及相关分布:数值实现和精算应用

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v103.i06

Christophe Dutang, Vincent Goulet, Nicholas Langevin

Actuaries model insurance claim amounts using heavy tailed probability distributions. They routinely need to evaluate quantities related to these distributions such as quantiles in the far right tail, moments or limited moments. Furthermore, actuaries often resort to simulation to solve otherwise untractable risk evaluation problems. The paper discusses our implementation of support functions for the Feller-Pareto distribution for the R package actuar . The Feller-Pareto defines a large family of heavy tailed distributions encompassing the transformed beta family and many variants of the Pareto distribution.

精算师使用重尾概率分布对保险索赔金额进行建模。他们通常需要评估与这些分布相关的数量，比如最右尾部的分位数、矩或有限矩。此外，精算师经常求助于模拟来解决否则难以处理的风险评估问题。本文讨论了我们对R包精算器的Feller-Pareto分布的支持函数的实现。费勒-帕累托定义了一个大的重尾分布族，其中包括转化的β族和帕累托分布的许多变体。

引用次数: 2

covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas covsim:一个用copula模拟结构方程模型非正态数据的R包

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v102.i03

Steffen Grønneberg, Njål Foldnes, Katerina M. Marcoulides

In factor analysis and structural equation modeling non-normal data simulation is traditionally performed by specifying univariate skewness and kurtosis together with the target covariance matrix. However, this leaves little control over the univariate distributions and the multivariate copula of the simulated vector. In this paper we explain how a more flexible simulation method called vine-to-anything (VITA) may be obtained from copula-based techniques, as implemented in a new R package, covsim . VITA is based on the concept of a regular vine, where bivariate copulas are coupled together into a full multivariate copula. We illustrate how to simulate continuous and ordinal data for covariance modeling, and how to use the new package discnorm to test for underlying normality in ordinal data. An introduction to copula and vine simulation is provided in the appendix.

在因子分析和结构方程建模中，传统的非正态数据模拟是通过指定单变量偏度和峰度以及目标协方差矩阵来实现的。然而，这对模拟向量的单变量分布和多变量联结几乎没有控制。在本文中，我们解释了如何从基于copula的技术中获得一种更灵活的模拟方法，称为“葡萄到任何东西”(VITA)，该方法在新的R包covsim中实现。VITA是基于一个规则的藤的概念，其中二元连在一起耦合成一个完整的多元连。我们说明了如何模拟连续和有序数据进行协方差建模，以及如何使用新的包异常来测试有序数据中的潜在正态性。附录中提供了对copula和vine模拟的介绍。

引用次数: 5

The poolr Package for Combining Independent and Dependent p Values 用于组合独立和依赖p值的池包

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software

Pub Date : 2022-01-01 DOI: 10.18637/jss.v101.i01

Ozan Cinar, W. Viechtbauer

The poolr package provides an implementation of a variety of methods for pooling (i.e., combining) p values, including Fisher’s method, Stouffer’s method, the inverse chisquare method, the binomial test, the Bonferroni method, and Tippett’s method. More importantly, the methods can be adjusted to account for dependence among the tests from which the p values have been derived assuming multivariate normality among the test statistics. All methods can be adjusted based on an estimate of the effective number of tests or by using an empirically-derived null distribution based on pseudo replicates that mimics a proper permutation test. For the Fisher, Stouffer, and inverse chi-square methods, the test statistics can also be directly generalized to account for dependence, leading to Brown’s method, Strube’s method, and the generalized inverse chi-square method. In this paper, we describe the various methods, discuss their implementation in the package, illustrate their use based on several examples, and compare the poolr package with several other packages that can be used to combine p values.

poolr包提供了多种p值池化(即组合)方法的实现，包括Fisher方法、Stouffer方法、逆方法、二项检验、Bonferroni方法和Tippett方法。更重要的是，这些方法可以调整，以解释检验之间的相关性，从检验统计量中得出的p值假设多元正态性。所有方法都可以根据对有效测试数的估计进行调整，或者通过使用基于模拟适当排列测试的伪重复的经验推导的零分布进行调整。对于Fisher、Stouffer和逆卡方方法，检验统计量也可以直接一般化以解释相关性，从而产生Brown的方法、Strube的方法和广义逆卡方方法。在本文中，我们描述了各种方法，讨论了它们在包中的实现，基于几个例子说明了它们的使用，并将poolr包与其他几个可用于组合p值的包进行了比较。

引用次数: 31

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Statistical Software

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀