Pub Date : 2022-08-27DOI: 10.1080/00031305.2023.2249522
Matthew Sainsbury-Dale, A. Zammit‐Mangion, Raphael Huser
Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of making inference from replicated data, which we address in the neural setting using permutation-invariant neural networks. Through extensive simulation studies we show that these neural point estimators can quickly and optimally (in a Bayes sense) estimate parameters in weakly-identified and highly-parameterised models with relative ease. We demonstrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.
{"title":"Likelihood-Free Parameter Estimation with Neural Bayes Estimators","authors":"Matthew Sainsbury-Dale, A. Zammit‐Mangion, Raphael Huser","doi":"10.1080/00031305.2023.2249522","DOIUrl":"https://doi.org/10.1080/00031305.2023.2249522","url":null,"abstract":"Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of making inference from replicated data, which we address in the neural setting using permutation-invariant neural networks. Through extensive simulation studies we show that these neural point estimators can quickly and optimally (in a Bayes sense) estimate parameters in weakly-identified and highly-parameterised models with relative ease. We demonstrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123917230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-23DOI: 10.1080/00031305.2022.2115552
M. Grabchak
Abstract We address the question of how to perform a paired t-test in situations where we do not know how to pair the data. Specifically, we discuss approaches for bounding the test statistic of the paired t-test in a way that allows us to recover the results of this test in some cases. We also discuss the relationship between the paired t-test and the independent samples t-test and what happens if we use the latter to approximate the former. Our results are informed by both theoretical results and a simulation study.
{"title":"How Do We Perform a Paired t-Test When We Don’t Know How to Pair?","authors":"M. Grabchak","doi":"10.1080/00031305.2022.2115552","DOIUrl":"https://doi.org/10.1080/00031305.2022.2115552","url":null,"abstract":"Abstract We address the question of how to perform a paired t-test in situations where we do not know how to pair the data. Specifically, we discuss approaches for bounding the test statistic of the paired t-test in a way that allows us to recover the results of this test in some cases. We also discuss the relationship between the paired t-test and the independent samples t-test and what happens if we use the latter to approximate the former. Our results are informed by both theoretical results and a simulation study.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129385742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-10DOI: 10.1080/00031305.2023.2203177
Sandra Siegfried, Lucas Kook, T. Hothorn
We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming at distribution-free and parsimonious regression modelling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for continuous, discrete, and randomly censored observations, along with corresponding score functions. A plethora of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, non-linear ordinal regression, and growth curves. All analyses are reproducible with the help of the"tram"add-on package to the R system for statistical computing and graphics.
{"title":"Distribution-Free Location-Scale Regression","authors":"Sandra Siegfried, Lucas Kook, T. Hothorn","doi":"10.1080/00031305.2023.2203177","DOIUrl":"https://doi.org/10.1080/00031305.2023.2203177","url":null,"abstract":"We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming at distribution-free and parsimonious regression modelling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for continuous, discrete, and randomly censored observations, along with corresponding score functions. A plethora of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, non-linear ordinal regression, and growth curves. All analyses are reproducible with the help of the\"tram\"add-on package to the R system for statistical computing and graphics.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133894712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-08DOI: 10.1080/00031305.2022.2110939
H. Charvat
Abstract Shared frailty models, that is, hazard regression models for censored data including random effects acting multiplicatively on the hazard, are commonly used to analyze time-to-event data possessing a hierarchical structure. When the random effects are assumed to be normally distributed, the cluster-specific marginal likelihood has no closed-form expression. A powerful method for approximating such integrals is the adaptive Gauss-Hermite quadrature (AGHQ). However, this method requires the estimation of the mode of the integrand in the expression defining the cluster-specific marginal likelihood: it is generally obtained through a nested optimization at the cluster level for each evaluation of the likelihood function. In this work, we show that in the case of a parametric shared frailty model including a normal random intercept, the cluster-specific modes can be determined analytically by using the principal branch of the Lambert function, . Besides removing the need for the nested optimization procedure, it provides closed-form formulas for the gradient and Hessian of the approximated likelihood making its maximization by Newton-type algorithms convenient and efficient. The Lambert-based AGHQ (LAGHQ) might be applied to other problems involving similar integrals, such as the normally distributed random intercept Poisson model and the computation of probabilities from a Poisson lognormal distribution.
{"title":"Using the Lambert Function to Estimate Shared Frailty Models with a Normally Distributed Random Intercept","authors":"H. Charvat","doi":"10.1080/00031305.2022.2110939","DOIUrl":"https://doi.org/10.1080/00031305.2022.2110939","url":null,"abstract":"Abstract Shared frailty models, that is, hazard regression models for censored data including random effects acting multiplicatively on the hazard, are commonly used to analyze time-to-event data possessing a hierarchical structure. When the random effects are assumed to be normally distributed, the cluster-specific marginal likelihood has no closed-form expression. A powerful method for approximating such integrals is the adaptive Gauss-Hermite quadrature (AGHQ). However, this method requires the estimation of the mode of the integrand in the expression defining the cluster-specific marginal likelihood: it is generally obtained through a nested optimization at the cluster level for each evaluation of the likelihood function. In this work, we show that in the case of a parametric shared frailty model including a normal random intercept, the cluster-specific modes can be determined analytically by using the principal branch of the Lambert function, . Besides removing the need for the nested optimization procedure, it provides closed-form formulas for the gradient and Hessian of the approximated likelihood making its maximization by Newton-type algorithms convenient and efficient. The Lambert-based AGHQ (LAGHQ) might be applied to other problems involving similar integrals, such as the normally distributed random intercept Poisson model and the computation of probabilities from a Poisson lognormal distribution.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127286016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-08DOI: 10.1080/00031305.2023.2216253
L. Han, A. Arfè, L. Trippa
The use of simulation-based sensitivity analyses is fundamental to evaluate and compare candidate designs for future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics (OCs) with respect to various unknown parameters (UPs). Typical examples of OCs include the likelihood of detecting treatment effects and the average study duration, which depend on UPs that are not known until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ and (ii) the list of OCs of interest. We propose a new approach to choose the set of scenarios for inclusion in design sensitivity analyses. Our approach balances the need for simplicity and interpretability of OCs computed across several scenarios with the need to faithfully summarize -- through simulations -- how the OCs vary across all plausible values of the UPs. Our proposal also supports the selection of the number of simulation scenarios to be included in the final sensitivity analysis report. To achieve these goals, we minimize a loss function $mathcal{L}(boldsymbol{theta}_1,...,boldsymbol{theta}_K)$ that formalizes whether a specific set of $K$ sensitivity scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ is adequate to summarize how the OCs of the trial design vary across all plausible values of the UPs. Then, we use optimization techniques to select the best set of simulation scenarios to exemplify the OCs of the trial design.
{"title":"Sensitivity Analyses of Clinical Trial Designs: Selecting Scenarios and Summarizing Operating Characteristics","authors":"L. Han, A. Arfè, L. Trippa","doi":"10.1080/00031305.2023.2216253","DOIUrl":"https://doi.org/10.1080/00031305.2023.2216253","url":null,"abstract":"The use of simulation-based sensitivity analyses is fundamental to evaluate and compare candidate designs for future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics (OCs) with respect to various unknown parameters (UPs). Typical examples of OCs include the likelihood of detecting treatment effects and the average study duration, which depend on UPs that are not known until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ and (ii) the list of OCs of interest. We propose a new approach to choose the set of scenarios for inclusion in design sensitivity analyses. Our approach balances the need for simplicity and interpretability of OCs computed across several scenarios with the need to faithfully summarize -- through simulations -- how the OCs vary across all plausible values of the UPs. Our proposal also supports the selection of the number of simulation scenarios to be included in the final sensitivity analysis report. To achieve these goals, we minimize a loss function $mathcal{L}(boldsymbol{theta}_1,...,boldsymbol{theta}_K)$ that formalizes whether a specific set of $K$ sensitivity scenarios ${boldsymbol{theta}_1,...,boldsymbol{theta}_K}$ is adequate to summarize how the OCs of the trial design vary across all plausible values of the UPs. Then, we use optimization techniques to select the best set of simulation scenarios to exemplify the OCs of the trial design.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134123574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-03DOI: 10.1080/00031305.2022.2139294
D. Warton
Residual plots are often used to interrogate regression model assumptions, but interpreting them requires an understanding of how much sampling variation to expect when assumptions are satisfied. In this paper, we propose constructing global envelopes around data (or around trends fitted to data) on residual plots, exploiting recent advances that enable construction of global envelopes around functions by simulation. While the proposed tools are primarily intended as a graphical aid, they can be interpreted as formal tests of model assumptions, which enables the study of their properties via simulation experiments. We considered three model scenarios – fitting a linear model, generalized linear model or generalized linear mixed model – and explored the power of global simulation envelope tests constructed around data on quantile-quantile plots, or around trend lines on residual vs fits plots or scale-location plots. Global envelope tests compared favorably to commonly used tests of assumptions at detecting violations of distributional and linearity assumptions. Freely available R software ( ecostats::plotenvelope ) enables application of these tools to any fitted model that has methods for the simulate , residuals and predict functions.
{"title":"Global simulation envelopes for diagnostic plots in regression models","authors":"D. Warton","doi":"10.1080/00031305.2022.2139294","DOIUrl":"https://doi.org/10.1080/00031305.2022.2139294","url":null,"abstract":"Residual plots are often used to interrogate regression model assumptions, but interpreting them requires an understanding of how much sampling variation to expect when assumptions are satisfied. In this paper, we propose constructing global envelopes around data (or around trends fitted to data) on residual plots, exploiting recent advances that enable construction of global envelopes around functions by simulation. While the proposed tools are primarily intended as a graphical aid, they can be interpreted as formal tests of model assumptions, which enables the study of their properties via simulation experiments. We considered three model scenarios – fitting a linear model, generalized linear model or generalized linear mixed model – and explored the power of global simulation envelope tests constructed around data on quantile-quantile plots, or around trend lines on residual vs fits plots or scale-location plots. Global envelope tests compared favorably to commonly used tests of assumptions at detecting violations of distributional and linearity assumptions. Freely available R software ( ecostats::plotenvelope ) enables application of these tools to any fitted model that has methods for the simulate , residuals and predict functions.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114842235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-29DOI: 10.1080/00031305.2022.2107568
Andrew J. Sage, Yang Liu, Joe Sato
Abstract We introduce a pair of Shiny web applications that allow users to visualize random forest prediction intervals alongside those produced by linear regression models. The apps are designed to help undergraduate students deepen their understanding of the role that assumptions play in statistical modeling by comparing and contrasting intervals produced by regression models with those produced by more flexible algorithmic techniques. We describe the mechanics of each approach, illustrate the features of the apps, provide examples highlighting the insights students can gain through their use, and discuss our experience implementing them in an undergraduate class. We argue that, contrary to their reputation as a black box, random forests can be used as a spotlight, for educational purposes, illuminating the role of assumptions in regression models and their impact on the shape, width, and coverage rates of prediction intervals.
{"title":"From Black Box to Shining Spotlight: Using Random Forest Prediction Intervals to Illuminate the Impact of Assumptions in Linear Regression","authors":"Andrew J. Sage, Yang Liu, Joe Sato","doi":"10.1080/00031305.2022.2107568","DOIUrl":"https://doi.org/10.1080/00031305.2022.2107568","url":null,"abstract":"Abstract We introduce a pair of Shiny web applications that allow users to visualize random forest prediction intervals alongside those produced by linear regression models. The apps are designed to help undergraduate students deepen their understanding of the role that assumptions play in statistical modeling by comparing and contrasting intervals produced by regression models with those produced by more flexible algorithmic techniques. We describe the mechanics of each approach, illustrate the features of the apps, provide examples highlighting the insights students can gain through their use, and discuss our experience implementing them in an undergraduate class. We argue that, contrary to their reputation as a black box, random forests can be used as a spotlight, for educational purposes, illuminating the role of assumptions in regression models and their impact on the shape, width, and coverage rates of prediction intervals.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115726632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-27DOI: 10.1080/00031305.2022.2105950
M. Matabuena, M. Karas, S. Riazati, N. Caplan, P. Hayes
Abstract Modern wearable monitors and laboratory equipment allow the recording of high-frequency data that can be used to quantify human movement. However, currently, data analysis approaches in these domains remain limited. This article proposes a new framework to analyze biomechanical patterns in sport training data recorded across multiple training sessions using multilevel functional models. We apply the methods to subsecond-level data of knee location trajectories collected in 19 recreational runners during a medium-intensity continuous run (MICR) and a high-intensity interval training (HIIT) session, with multiple steps recorded in each participant-session. We estimate functional intra-class correlation coefficient to evaluate the reliability of recorded measurements across multiple sessions of the same training type. Furthermore, we obtained a vectorial representation of the three hierarchical levels of the data and visualize them in a low-dimensional space. Finally, we quantified the differences between genders and between two training types using functional multilevel regression models that incorporate covariate information. We provide an overview of the relevant methods and make both data and the R code for all analyses freely available online on GitHub. Thus, this work can serve as a helpful reference for practitioners and guide for a broader audience of researchers interested in modeling repeated functional measures at different resolution levels in the context of biomechanics and sports science applications.
{"title":"Estimating Knee Movement Patterns of Recreational Runners Across Training Sessions Using Multilevel Functional Regression Models","authors":"M. Matabuena, M. Karas, S. Riazati, N. Caplan, P. Hayes","doi":"10.1080/00031305.2022.2105950","DOIUrl":"https://doi.org/10.1080/00031305.2022.2105950","url":null,"abstract":"Abstract Modern wearable monitors and laboratory equipment allow the recording of high-frequency data that can be used to quantify human movement. However, currently, data analysis approaches in these domains remain limited. This article proposes a new framework to analyze biomechanical patterns in sport training data recorded across multiple training sessions using multilevel functional models. We apply the methods to subsecond-level data of knee location trajectories collected in 19 recreational runners during a medium-intensity continuous run (MICR) and a high-intensity interval training (HIIT) session, with multiple steps recorded in each participant-session. We estimate functional intra-class correlation coefficient to evaluate the reliability of recorded measurements across multiple sessions of the same training type. Furthermore, we obtained a vectorial representation of the three hierarchical levels of the data and visualize them in a low-dimensional space. Finally, we quantified the differences between genders and between two training types using functional multilevel regression models that incorporate covariate information. We provide an overview of the relevant methods and make both data and the R code for all analyses freely available online on GitHub. Thus, this work can serve as a helpful reference for practitioners and guide for a broader audience of researchers interested in modeling repeated functional measures at different resolution levels in the context of biomechanics and sports science applications.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114769883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-26DOI: 10.1080/00031305.2022.2106305
A. Huang
Abstract We survey a range of popular generalized count distributions, investigating which (if any) can be arbitrarily underdispersed, that is, its variance can be arbitrarily small compared to its mean. A philosophical implication is that some models failing this simple criterion should not be considered as “statistical models” according to McCullagh’s extendibility criterion. Four practical implications are also discussed: (i) functional independence of parameters, (ii) double generalized linear models, (iii) simulation of underdispersed counts, and (iv) severely underdispersed count regression. We suggest that all future generalizations of the Poisson distribution be tested against this key property.
{"title":"On Arbitrarily Underdispersed Discrete Distributions","authors":"A. Huang","doi":"10.1080/00031305.2022.2106305","DOIUrl":"https://doi.org/10.1080/00031305.2022.2106305","url":null,"abstract":"Abstract We survey a range of popular generalized count distributions, investigating which (if any) can be arbitrarily underdispersed, that is, its variance can be arbitrarily small compared to its mean. A philosophical implication is that some models failing this simple criterion should not be considered as “statistical models” according to McCullagh’s extendibility criterion. Four practical implications are also discussed: (i) functional independence of parameters, (ii) double generalized linear models, (iii) simulation of underdispersed counts, and (iv) severely underdispersed count regression. We suggest that all future generalizations of the Poisson distribution be tested against this key property.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133149851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-03DOI: 10.1080/00031305.2022.2074540
D. Harville
The authors establish and illustrate some relationships among the noncentrality parameters identified with three F tests T 1 , T 2 , and T 3 applicable to a setting where the data consist of the realized values of the elements of an N × 1 random vector y that is distributed as N ( X β , σ 2 I ) (MVN with mean vector X β and variance-covariance matrix σ 2 I ). In what follows, it is shown that these relationships can be established in a relatively simple way that uses only readily available results, that provides insights into the underlying rationale, and that lends itself to some potentially useful extensions. The three F -tests can be regarded as pertaining to a J 1 × 1 vector τ 1 = R 1 β and a J 2 × 1 vector τ 2 = R 2 β formed from J = J 1 + J 2 linearly independent estimable linear combinations of the elements of β : for a J 1 × 1 vector of constants r 1 and a J 2 × 1 vector of constants r 2 , T 1 is a test of the null hypothesis τ = r where τ = ( τ (cid:2) 1 , τ (cid:2) 2 ) (cid:2) and r = ( r (cid:2) 1 , r (cid:2) 2 ) (cid:2) , T 2 is a test of the null hypothesis τ 1 = r 1 (when β is unrestricted), and T 3 is a test of the null hypothesis τ 1 = r 1 when β is subject to the restriction τ 2 = r 2 . The noncentrality parameters identified with T 1 , T 2 , and T 3 are
{"title":"Comment on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022)","authors":"D. Harville","doi":"10.1080/00031305.2022.2074540","DOIUrl":"https://doi.org/10.1080/00031305.2022.2074540","url":null,"abstract":"The authors establish and illustrate some relationships among the noncentrality parameters identified with three F tests T 1 , T 2 , and T 3 applicable to a setting where the data consist of the realized values of the elements of an N × 1 random vector y that is distributed as N ( X β , σ 2 I ) (MVN with mean vector X β and variance-covariance matrix σ 2 I ). In what follows, it is shown that these relationships can be established in a relatively simple way that uses only readily available results, that provides insights into the underlying rationale, and that lends itself to some potentially useful extensions. The three F -tests can be regarded as pertaining to a J 1 × 1 vector τ 1 = R 1 β and a J 2 × 1 vector τ 2 = R 2 β formed from J = J 1 + J 2 linearly independent estimable linear combinations of the elements of β : for a J 1 × 1 vector of constants r 1 and a J 2 × 1 vector of constants r 2 , T 1 is a test of the null hypothesis τ = r where τ = ( τ (cid:2) 1 , τ (cid:2) 2 ) (cid:2) and r = ( r (cid:2) 1 , r (cid:2) 2 ) (cid:2) , T 2 is a test of the null hypothesis τ 1 = r 1 (when β is unrestricted), and T 3 is a test of the null hypothesis τ 1 = r 1 when β is subject to the restriction τ 2 = r 2 . The noncentrality parameters identified with T 1 , T 2 , and T 3 are","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131449340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}