首页 > 最新文献

The American Statistician最新文献

英文 中文
Likelihood-Free Parameter Estimation with Neural Bayes Estimators 基于神经贝叶斯估计的无似然参数估计
Pub Date : 2022-08-27 DOI: 10.1080/00031305.2023.2249522
Matthew Sainsbury-Dale, A. Zammit‐Mangion, Raphael Huser
Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of making inference from replicated data, which we address in the neural setting using permutation-invariant neural networks. Through extensive simulation studies we show that these neural point estimators can quickly and optimally (in a Bayes sense) estimate parameters in weakly-identified and highly-parameterised models with relative ease. We demonstrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.
神经点估计器是将数据映射到参数点估计的神经网络。它们是快速的、无似然的,并且由于它们的平摊性质,适于快速的基于自引导的不确定性量化。在本文中,我们的目标是提高统计学家对这个相对较新的推理工具的认识,并通过提供用户友好的开源软件来促进其采用。我们还关注了从复制数据中进行推理的普遍问题,我们使用置换不变神经网络在神经设置中解决了这个问题。通过广泛的仿真研究,我们表明这些神经点估计器可以相对容易地快速和最优地(在贝叶斯意义上)估计弱识别和高参数化模型中的参数。我们通过对红海极端海面温度的分析证明了它们的适用性,在那里,经过训练,我们在几分之一秒内从数百个空间场获得参数估计和基于bootstrap的置信区间。
{"title":"Likelihood-Free Parameter Estimation with Neural Bayes Estimators","authors":"Matthew Sainsbury-Dale, A. Zammit‐Mangion, Raphael Huser","doi":"10.1080/00031305.2023.2249522","DOIUrl":"https://doi.org/10.1080/00031305.2023.2249522","url":null,"abstract":"Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of making inference from replicated data, which we address in the neural setting using permutation-invariant neural networks. Through extensive simulation studies we show that these neural point estimators can quickly and optimally (in a Bayes sense) estimate parameters in weakly-identified and highly-parameterised models with relative ease. We demonstrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123917230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
How Do We Perform a Paired t-Test When We Don’t Know How to Pair? 当我们不知道如何配对时,我们如何进行配对t检验?
Pub Date : 2022-08-23 DOI: 10.1080/00031305.2022.2115552
M. Grabchak
Abstract We address the question of how to perform a paired t-test in situations where we do not know how to pair the data. Specifically, we discuss approaches for bounding the test statistic of the paired t-test in a way that allows us to recover the results of this test in some cases. We also discuss the relationship between the paired t-test and the independent samples t-test and what happens if we use the latter to approximate the former. Our results are informed by both theoretical results and a simulation study.
我们解决了如何在我们不知道如何配对数据的情况下执行配对t检验的问题。具体来说,我们讨论了配对t检验检验统计量的边界方法,这种方法允许我们在某些情况下恢复该检验的结果。我们还讨论了配对t检验和独立样本t检验之间的关系,以及如果我们使用后者来近似前者会发生什么。我们的结果得到了理论结果和模拟研究的支持。
{"title":"How Do We Perform a Paired t-Test When We Don’t Know How to Pair?","authors":"M. Grabchak","doi":"10.1080/00031305.2022.2115552","DOIUrl":"https://doi.org/10.1080/00031305.2022.2115552","url":null,"abstract":"Abstract We address the question of how to perform a paired t-test in situations where we do not know how to pair the data. Specifically, we discuss approaches for bounding the test statistic of the paired t-test in a way that allows us to recover the results of this test in some cases. We also discuss the relationship between the paired t-test and the independent samples t-test and what happens if we use the latter to approximate the former. Our results are informed by both theoretical results and a simulation study.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129385742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distribution-Free Location-Scale Regression 无分布位置尺度回归
Pub Date : 2022-08-10 DOI: 10.1080/00031305.2023.2203177
Sandra Siegfried, Lucas Kook, T. Hothorn
We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming at distribution-free and parsimonious regression modelling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for continuous, discrete, and randomly censored observations, along with corresponding score functions. A plethora of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, non-linear ordinal regression, and growth curves. All analyses are reproducible with the help of the"tram"add-on package to the R system for statistical computing and graphics.
我们引入了一种广义的位置、尺度和形状的加性模型(GAMLSS),旨在为任意结果建立无分布和简洁的回归模型。我们用一个从数据中估计出来的转换函数来代替严格的参数分布。这样做不仅使模型无分布,而且还允许将线性或平滑模型项的数量限制为一对位置尺度预测函数。我们推导出连续、离散和随机删减观测值的似然,以及相应的分数函数。大量现有算法被用于模型估计,包括约束最大似然、原始GAMLSS算法和转换树。结果模型中的参数可解释性与模型选择密切相关。我们提出了一种新的最佳子集选择程序的应用,以实现特别简单的解释方法。所有的技术都是由来自不同领域的一系列应用驱动和说明的,包括交叉和部分比例风险、复计数回归、非线性有序回归和增长曲线。在R系统的统计计算和图形的“tram”附加包的帮助下,所有的分析都是可重复的。
{"title":"Distribution-Free Location-Scale Regression","authors":"Sandra Siegfried, Lucas Kook, T. Hothorn","doi":"10.1080/00031305.2023.2203177","DOIUrl":"https://doi.org/10.1080/00031305.2023.2203177","url":null,"abstract":"We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming at distribution-free and parsimonious regression modelling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for continuous, discrete, and randomly censored observations, along with corresponding score functions. A plethora of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, non-linear ordinal regression, and growth curves. All analyses are reproducible with the help of the\"tram\"add-on package to the R system for statistical computing and graphics.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133894712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using the Lambert Function to Estimate Shared Frailty Models with a Normally Distributed Random Intercept 用Lambert函数估计具有正态分布随机截距的共享脆弱性模型
Pub Date : 2022-08-08 DOI: 10.1080/00031305.2022.2110939
H. Charvat
Abstract Shared frailty models, that is, hazard regression models for censored data including random effects acting multiplicatively on the hazard, are commonly used to analyze time-to-event data possessing a hierarchical structure. When the random effects are assumed to be normally distributed, the cluster-specific marginal likelihood has no closed-form expression. A powerful method for approximating such integrals is the adaptive Gauss-Hermite quadrature (AGHQ). However, this method requires the estimation of the mode of the integrand in the expression defining the cluster-specific marginal likelihood: it is generally obtained through a nested optimization at the cluster level for each evaluation of the likelihood function. In this work, we show that in the case of a parametric shared frailty model including a normal random intercept, the cluster-specific modes can be determined analytically by using the principal branch of the Lambert function, . Besides removing the need for the nested optimization procedure, it provides closed-form formulas for the gradient and Hessian of the approximated likelihood making its maximization by Newton-type algorithms convenient and efficient. The Lambert-based AGHQ (LAGHQ) might be applied to other problems involving similar integrals, such as the normally distributed random intercept Poisson model and the computation of probabilities from a Poisson lognormal distribution.
共享脆弱性模型,即包含随机效应乘法作用于风险的审查数据的风险回归模型,通常用于分析具有层次结构的事件时间数据。当随机效应被假设为正态分布时,集群特有的边际似然没有封闭形式的表达式。逼近这类积分的一种有效方法是自适应高斯-埃尔米特正交法(AGHQ)。然而,该方法需要对定义特定于聚类的边际似然的表达式中的被积函数的模式进行估计,一般是通过对每个似然函数的评估在聚类级别进行嵌套优化得到。在这项工作中,我们证明了在包含正态随机截距的参数共享脆弱性模型的情况下,集群特定模式可以通过使用Lambert函数的主分支解析确定。除了不需要嵌套优化过程外,它还提供了近似似然的梯度和Hessian的封闭形式公式,使其通过牛顿型算法的最大化变得方便和高效。基于lambert的AGHQ (LAGHQ)可以应用于其他涉及类似积分的问题,例如正态分布随机截距泊松模型和从泊松对数正态分布计算概率。
{"title":"Using the Lambert Function to Estimate Shared Frailty Models with a Normally Distributed Random Intercept","authors":"H. Charvat","doi":"10.1080/00031305.2022.2110939","DOIUrl":"https://doi.org/10.1080/00031305.2022.2110939","url":null,"abstract":"Abstract Shared frailty models, that is, hazard regression models for censored data including random effects acting multiplicatively on the hazard, are commonly used to analyze time-to-event data possessing a hierarchical structure. When the random effects are assumed to be normally distributed, the cluster-specific marginal likelihood has no closed-form expression. A powerful method for approximating such integrals is the adaptive Gauss-Hermite quadrature (AGHQ). However, this method requires the estimation of the mode of the integrand in the expression defining the cluster-specific marginal likelihood: it is generally obtained through a nested optimization at the cluster level for each evaluation of the likelihood function. In this work, we show that in the case of a parametric shared frailty model including a normal random intercept, the cluster-specific modes can be determined analytically by using the principal branch of the Lambert function, . Besides removing the need for the nested optimization procedure, it provides closed-form formulas for the gradient and Hessian of the approximated likelihood making its maximization by Newton-type algorithms convenient and efficient. The Lambert-based AGHQ (LAGHQ) might be applied to other problems involving similar integrals, such as the normally distributed random intercept Poisson model and the computation of probabilities from a Poisson lognormal distribution.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127286016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity Analyses of Clinical Trial Designs: Selecting Scenarios and Summarizing Operating Characteristics 临床试验设计的敏感性分析:选择方案和总结操作特征
Pub Date : 2022-08-08 DOI: 10.1080/00031305.2023.2216253
L. Han, A. Arfè, L. Trippa
The use of simulation-based sensitivity analyses is fundamental to evaluate and compare candidate designs for future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics (OCs) with respect to various unknown parameters (UPs). Typical examples of OCs include the likelihood of detecting treatment effects and the average study duration, which depend on UPs that are not known until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ and (ii) the list of OCs of interest. We propose a new approach to choose the set of scenarios for inclusion in design sensitivity analyses. Our approach balances the need for simplicity and interpretability of OCs computed across several scenarios with the need to faithfully summarize -- through simulations -- how the OCs vary across all plausible values of the UPs. Our proposal also supports the selection of the number of simulation scenarios to be included in the final sensitivity analysis report. To achieve these goals, we minimize a loss function $mathcal{L}(boldsymbol{theta}_1,...,boldsymbol{theta}_K)$ that formalizes whether a specific set of $K$ sensitivity scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ is adequate to summarize how the OCs of the trial design vary across all plausible values of the UPs. Then, we use optimization techniques to select the best set of simulation scenarios to exemplify the OCs of the trial design.
使用基于模拟的敏感性分析是评估和比较未来临床试验候选设计的基础。在这种情况下,敏感性分析对于评估重要设计工作特性(OCs)相对于各种未知参数(UPs)的依赖性特别有用。OCs的典型例子包括检测治疗效果的可能性和平均研究持续时间,这取决于临床研究开始后才知道的UPs,例如主要结果的分布和患者概况。敏感性分析的两个关键组成部分是(i)选择一组合理的模拟场景${boldsymbol{theta}_1,…,boldsymbol{theta}_K}$和(ii)感兴趣的OCs列表。我们提出了一种新的方法来选择一组场景纳入设计敏感性分析。我们的方法平衡了对几种情况下计算的oc的简单性和可解释性的需求,以及通过模拟忠实地总结oc在所有可能的UPs值中如何变化的需求。我们的建议还支持在最终的敏感性分析报告中选择若干模拟情景。为了实现这些目标,我们最小化了一个损失函数$mathcal{L}(boldsymbol{theta}_1,…,boldsymbol{theta}_K)$,它正式化了一组特定的$K$敏感性场景${boldsymbol{theta}_1,…,boldsymbol{theta}_K}$足以概括试验设计的oc如何在所有可能的UPs值上变化。然后,我们使用优化技术来选择最佳的模拟场景集来举例说明试验设计的OCs。
{"title":"Sensitivity Analyses of Clinical Trial Designs: Selecting Scenarios and Summarizing Operating Characteristics","authors":"L. Han, A. Arfè, L. Trippa","doi":"10.1080/00031305.2023.2216253","DOIUrl":"https://doi.org/10.1080/00031305.2023.2216253","url":null,"abstract":"The use of simulation-based sensitivity analyses is fundamental to evaluate and compare candidate designs for future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics (OCs) with respect to various unknown parameters (UPs). Typical examples of OCs include the likelihood of detecting treatment effects and the average study duration, which depend on UPs that are not known until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ and (ii) the list of OCs of interest. We propose a new approach to choose the set of scenarios for inclusion in design sensitivity analyses. Our approach balances the need for simplicity and interpretability of OCs computed across several scenarios with the need to faithfully summarize -- through simulations -- how the OCs vary across all plausible values of the UPs. Our proposal also supports the selection of the number of simulation scenarios to be included in the final sensitivity analysis report. To achieve these goals, we minimize a loss function $mathcal{L}(boldsymbol{theta}_1,...,boldsymbol{theta}_K)$ that formalizes whether a specific set of $K$ sensitivity scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ is adequate to summarize how the OCs of the trial design vary across all plausible values of the UPs. Then, we use optimization techniques to select the best set of simulation scenarios to exemplify the OCs of the trial design.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134123574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global simulation envelopes for diagnostic plots in regression models 回归模型中诊断图的全局模拟包络
Pub Date : 2022-08-03 DOI: 10.1080/00031305.2022.2139294
D. Warton
Residual plots are often used to interrogate regression model assumptions, but interpreting them requires an understanding of how much sampling variation to expect when assumptions are satisfied. In this paper, we propose constructing global envelopes around data (or around trends fitted to data) on residual plots, exploiting recent advances that enable construction of global envelopes around functions by simulation. While the proposed tools are primarily intended as a graphical aid, they can be interpreted as formal tests of model assumptions, which enables the study of their properties via simulation experiments. We considered three model scenarios – fitting a linear model, generalized linear model or generalized linear mixed model – and explored the power of global simulation envelope tests constructed around data on quantile-quantile plots, or around trend lines on residual vs fits plots or scale-location plots. Global envelope tests compared favorably to commonly used tests of assumptions at detecting violations of distributional and linearity assumptions. Freely available R software ( ecostats::plotenvelope ) enables application of these tools to any fitted model that has methods for the simulate , residuals and predict functions.
残差图通常用于询问回归模型的假设,但是解释残差图需要了解当假设满足时期望有多少抽样变化。在本文中,我们建议在残差图上围绕数据(或围绕拟合数据的趋势)构建全局包络,利用最近的进展,通过模拟构建函数周围的全局包络。虽然提出的工具主要是作为图形辅助工具,但它们可以被解释为模型假设的正式测试,从而可以通过模拟实验研究其属性。我们考虑了三种模型情景——拟合线性模型、广义线性模型或广义线性混合模型——并探索了在分位数图上围绕数据构建的全局模拟包络测试的功能,或者在残差与拟合图或尺度位置图上围绕趋势线构建的功能。全局包络检验在检测违反分布和线性假设方面优于常用的假设检验。免费的R软件(ecostats::plotenvelope)可以将这些工具应用于任何具有模拟,残差和预测函数方法的拟合模型。
{"title":"Global simulation envelopes for diagnostic plots in regression models","authors":"D. Warton","doi":"10.1080/00031305.2022.2139294","DOIUrl":"https://doi.org/10.1080/00031305.2022.2139294","url":null,"abstract":"Residual plots are often used to interrogate regression model assumptions, but interpreting them requires an understanding of how much sampling variation to expect when assumptions are satisfied. In this paper, we propose constructing global envelopes around data (or around trends fitted to data) on residual plots, exploiting recent advances that enable construction of global envelopes around functions by simulation. While the proposed tools are primarily intended as a graphical aid, they can be interpreted as formal tests of model assumptions, which enables the study of their properties via simulation experiments. We considered three model scenarios – fitting a linear model, generalized linear model or generalized linear mixed model – and explored the power of global simulation envelope tests constructed around data on quantile-quantile plots, or around trend lines on residual vs fits plots or scale-location plots. Global envelope tests compared favorably to commonly used tests of assumptions at detecting violations of distributional and linearity assumptions. Freely available R software ( ecostats::plotenvelope ) enables application of these tools to any fitted model that has methods for the simulate , residuals and predict functions.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114842235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Black Box to Shining Spotlight: Using Random Forest Prediction Intervals to Illuminate the Impact of Assumptions in Linear Regression 从黑箱到聚光灯:用随机森林预测区间说明线性回归中假设的影响
Pub Date : 2022-07-29 DOI: 10.1080/00031305.2022.2107568
Andrew J. Sage, Yang Liu, Joe Sato
Abstract We introduce a pair of Shiny web applications that allow users to visualize random forest prediction intervals alongside those produced by linear regression models. The apps are designed to help undergraduate students deepen their understanding of the role that assumptions play in statistical modeling by comparing and contrasting intervals produced by regression models with those produced by more flexible algorithmic techniques. We describe the mechanics of each approach, illustrate the features of the apps, provide examples highlighting the insights students can gain through their use, and discuss our experience implementing them in an undergraduate class. We argue that, contrary to their reputation as a black box, random forests can be used as a spotlight, for educational purposes, illuminating the role of assumptions in regression models and their impact on the shape, width, and coverage rates of prediction intervals.
我们介绍了一对Shiny的web应用程序,允许用户将随机森林预测区间与线性回归模型产生的预测区间可视化。这些应用程序旨在通过比较和对比回归模型与更灵活的算法技术产生的区间,帮助本科生加深对假设在统计建模中所起作用的理解。我们描述了每种方法的机制,说明了应用程序的功能,提供了一些例子,突出了学生通过使用这些应用程序可以获得的见解,并讨论了我们在本科课堂上实施这些应用程序的经验。我们认为,与黑盒的名声相反,随机森林可以用作聚光灯,用于教育目的,阐明回归模型中假设的作用及其对预测区间的形状、宽度和覆盖率的影响。
{"title":"From Black Box to Shining Spotlight: Using Random Forest Prediction Intervals to Illuminate the Impact of Assumptions in Linear Regression","authors":"Andrew J. Sage, Yang Liu, Joe Sato","doi":"10.1080/00031305.2022.2107568","DOIUrl":"https://doi.org/10.1080/00031305.2022.2107568","url":null,"abstract":"Abstract We introduce a pair of Shiny web applications that allow users to visualize random forest prediction intervals alongside those produced by linear regression models. The apps are designed to help undergraduate students deepen their understanding of the role that assumptions play in statistical modeling by comparing and contrasting intervals produced by regression models with those produced by more flexible algorithmic techniques. We describe the mechanics of each approach, illustrate the features of the apps, provide examples highlighting the insights students can gain through their use, and discuss our experience implementing them in an undergraduate class. We argue that, contrary to their reputation as a black box, random forests can be used as a spotlight, for educational purposes, illuminating the role of assumptions in regression models and their impact on the shape, width, and coverage rates of prediction intervals.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115726632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Estimating Knee Movement Patterns of Recreational Runners Across Training Sessions Using Multilevel Functional Regression Models 利用多水平功能回归模型估计休闲跑步者的膝关节运动模式
Pub Date : 2022-07-27 DOI: 10.1080/00031305.2022.2105950
M. Matabuena, M. Karas, S. Riazati, N. Caplan, P. Hayes
Abstract Modern wearable monitors and laboratory equipment allow the recording of high-frequency data that can be used to quantify human movement. However, currently, data analysis approaches in these domains remain limited. This article proposes a new framework to analyze biomechanical patterns in sport training data recorded across multiple training sessions using multilevel functional models. We apply the methods to subsecond-level data of knee location trajectories collected in 19 recreational runners during a medium-intensity continuous run (MICR) and a high-intensity interval training (HIIT) session, with multiple steps recorded in each participant-session. We estimate functional intra-class correlation coefficient to evaluate the reliability of recorded measurements across multiple sessions of the same training type. Furthermore, we obtained a vectorial representation of the three hierarchical levels of the data and visualize them in a low-dimensional space. Finally, we quantified the differences between genders and between two training types using functional multilevel regression models that incorporate covariate information. We provide an overview of the relevant methods and make both data and the R code for all analyses freely available online on GitHub. Thus, this work can serve as a helpful reference for practitioners and guide for a broader audience of researchers interested in modeling repeated functional measures at different resolution levels in the context of biomechanics and sports science applications.
现代可穿戴监视器和实验室设备允许记录高频数据,可用于量化人体运动。然而,目前,这些领域的数据分析方法仍然有限。本文提出了一个新的框架来分析生物力学模式的运动训练数据记录跨越多个训练课程使用多层次功能模型。我们将方法应用于19名休闲跑步者在中等强度连续跑(MICR)和高强度间歇训练(HIIT)期间收集的膝关节定位轨迹亚秒级数据,并在每个参与者阶段记录多个步骤。我们估计功能类内相关系数,以评估在同一训练类型的多个会话中记录的测量值的可靠性。此外,我们获得了数据的三个层次的向量表示,并在低维空间中可视化它们。最后,我们使用包含协变量信息的功能多层回归模型量化了性别之间和两种训练类型之间的差异。我们提供了相关方法的概述,并在GitHub上免费提供所有分析的数据和R代码。因此,这项工作可以为从业者提供有益的参考,并为更广泛的研究人员在生物力学和运动科学应用的背景下对不同分辨率的重复功能测量建模感兴趣。
{"title":"Estimating Knee Movement Patterns of Recreational Runners Across Training Sessions Using Multilevel Functional Regression Models","authors":"M. Matabuena, M. Karas, S. Riazati, N. Caplan, P. Hayes","doi":"10.1080/00031305.2022.2105950","DOIUrl":"https://doi.org/10.1080/00031305.2022.2105950","url":null,"abstract":"Abstract Modern wearable monitors and laboratory equipment allow the recording of high-frequency data that can be used to quantify human movement. However, currently, data analysis approaches in these domains remain limited. This article proposes a new framework to analyze biomechanical patterns in sport training data recorded across multiple training sessions using multilevel functional models. We apply the methods to subsecond-level data of knee location trajectories collected in 19 recreational runners during a medium-intensity continuous run (MICR) and a high-intensity interval training (HIIT) session, with multiple steps recorded in each participant-session. We estimate functional intra-class correlation coefficient to evaluate the reliability of recorded measurements across multiple sessions of the same training type. Furthermore, we obtained a vectorial representation of the three hierarchical levels of the data and visualize them in a low-dimensional space. Finally, we quantified the differences between genders and between two training types using functional multilevel regression models that incorporate covariate information. We provide an overview of the relevant methods and make both data and the R code for all analyses freely available online on GitHub. Thus, this work can serve as a helpful reference for practitioners and guide for a broader audience of researchers interested in modeling repeated functional measures at different resolution levels in the context of biomechanics and sports science applications.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114769883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On Arbitrarily Underdispersed Discrete Distributions 关于任意欠分散离散分布
Pub Date : 2022-07-26 DOI: 10.1080/00031305.2022.2106305
A. Huang
Abstract We survey a range of popular generalized count distributions, investigating which (if any) can be arbitrarily underdispersed, that is, its variance can be arbitrarily small compared to its mean. A philosophical implication is that some models failing this simple criterion should not be considered as “statistical models” according to McCullagh’s extendibility criterion. Four practical implications are also discussed: (i) functional independence of parameters, (ii) double generalized linear models, (iii) simulation of underdispersed counts, and (iv) severely underdispersed count regression. We suggest that all future generalizations of the Poisson distribution be tested against this key property.
我们调查了一系列流行的广义计数分布,研究了哪些(如果有的话)可以任意欠分散,即其方差相对于均值可以任意小。一个哲学上的含义是,根据McCullagh的可扩展性标准,一些不符合这个简单标准的模型不应该被视为“统计模型”。还讨论了四个实际意义:(i)参数的函数独立性,(ii)双广义线性模型,(iii)欠分散计数的模拟,以及(iv)严重欠分散计数回归。我们建议今后所有泊松分布的推广都要根据这一关键性质进行检验。
{"title":"On Arbitrarily Underdispersed Discrete Distributions","authors":"A. Huang","doi":"10.1080/00031305.2022.2106305","DOIUrl":"https://doi.org/10.1080/00031305.2022.2106305","url":null,"abstract":"Abstract We survey a range of popular generalized count distributions, investigating which (if any) can be arbitrarily underdispersed, that is, its variance can be arbitrarily small compared to its mean. A philosophical implication is that some models failing this simple criterion should not be considered as “statistical models” according to McCullagh’s extendibility criterion. Four practical implications are also discussed: (i) functional independence of parameters, (ii) double generalized linear models, (iii) simulation of underdispersed counts, and (iv) severely underdispersed count regression. We suggest that all future generalizations of the Poisson distribution be tested against this key property.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133149851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Comment on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022) 评论格里菲斯和希尔(2022)的《关于线性模型中假设的f检验的力量》
Pub Date : 2022-07-03 DOI: 10.1080/00031305.2022.2074540
D. Harville
The authors establish and illustrate some relationships among the noncentrality parameters identified with three F tests T 1 , T 2 , and T 3 applicable to a setting where the data consist of the realized values of the elements of an N × 1 random vector y that is distributed as N ( X β , σ 2 I ) (MVN with mean vector X β and variance-covariance matrix σ 2 I ). In what follows, it is shown that these relationships can be established in a relatively simple way that uses only readily available results, that provides insights into the underlying rationale, and that lends itself to some potentially useful extensions. The three F -tests can be regarded as pertaining to a J 1 × 1 vector τ 1 = R 1 β and a J 2 × 1 vector τ 2 = R 2 β formed from J = J 1 + J 2 linearly independent estimable linear combinations of the elements of β : for a J 1 × 1 vector of constants r 1 and a J 2 × 1 vector of constants r 2 , T 1 is a test of the null hypothesis τ = r where τ = ( τ (cid:2) 1 , τ (cid:2) 2 ) (cid:2) and r = ( r (cid:2) 1 , r (cid:2) 2 ) (cid:2) , T 2 is a test of the null hypothesis τ 1 = r 1 (when β is unrestricted), and T 3 is a test of the null hypothesis τ 1 = r 1 when β is subject to the restriction τ 2 = r 2 . The noncentrality parameters identified with T 1 , T 2 , and T 3 are
本文建立并说明了用三种F检验t1、t2和t3识别的非中心性参数之间的一些关系,这些关系适用于N × 1随机向量y(平均向量X β和方差协方差矩阵σ 2 I)中元素的实现值构成的数据。在接下来的内容中,将展示这些关系可以以一种相对简单的方式建立,这种方式仅使用现成的结果,提供对潜在基本原理的见解,并提供一些潜在有用的扩展。这三个F -检验可以看作是关于j1 × 1向量τ 1 = r1 β和j2 × 1向量τ 2 = r2 β,它们是由β元素的J = j1 + j2线性无关的可估计线性组合形成的:1 J×1常数向量r 1和J 2×1常数向量r 2, T 1是一个测试的零假设τ= rτ=(τ(cid: 2) 1,τ(cid: 2) 2) (cid: 2)和r = (r (cid: 2) 1, r (cid: 2) 2) (cid: 2), T 2是一个测试的零假设τ1 = r(当β无限制)和T 3是一个测试的零假设τ1 = r 1β受限制时τ2 = r 2。由t1、t2和t3确定的非中心性参数为
{"title":"Comment on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022)","authors":"D. Harville","doi":"10.1080/00031305.2022.2074540","DOIUrl":"https://doi.org/10.1080/00031305.2022.2074540","url":null,"abstract":"The authors establish and illustrate some relationships among the noncentrality parameters identified with three F tests T 1 , T 2 , and T 3 applicable to a setting where the data consist of the realized values of the elements of an N × 1 random vector y that is distributed as N ( X β , σ 2 I ) (MVN with mean vector X β and variance-covariance matrix σ 2 I ). In what follows, it is shown that these relationships can be established in a relatively simple way that uses only readily available results, that provides insights into the underlying rationale, and that lends itself to some potentially useful extensions. The three F -tests can be regarded as pertaining to a J 1 × 1 vector τ 1 = R 1 β and a J 2 × 1 vector τ 2 = R 2 β formed from J = J 1 + J 2 linearly independent estimable linear combinations of the elements of β : for a J 1 × 1 vector of constants r 1 and a J 2 × 1 vector of constants r 2 , T 1 is a test of the null hypothesis τ = r where τ = ( τ (cid:2) 1 , τ (cid:2) 2 ) (cid:2) and r = ( r (cid:2) 1 , r (cid:2) 2 ) (cid:2) , T 2 is a test of the null hypothesis τ 1 = r 1 (when β is unrestricted), and T 3 is a test of the null hypothesis τ 1 = r 1 when β is subject to the restriction τ 2 = r 2 . The noncentrality parameters identified with T 1 , T 2 , and T 3 are","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131449340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
The American Statistician
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1