Pub Date : 2025-05-20DOI: 10.1080/00031305.2025.2507380
Layna Charlie Dennett, Antony Overstall, Dankmar Böhning
Meta-analysis is a well-established method for integrating results from several independent studies to estimate a common quantity of interest. However, meta-analysis is prone to selection bias, notably when particular studies are systematically excluded. This can lead to bias in estimating the quantity of interest. Motivated by a meta-analysis to estimate the rate of completed-suicide after bariatric surgery, where studies which reported no suicides were excluded, a novel zero-truncated count modeling approach was developed. This approach addresses heterogeneity, both observed and unobserved, through covariate and overdispersion modeling, respectively. Additionally, through the Horvitz-Thompson estimator, an approach is developed to estimate the number of excluded studies, a quantity of potential interest for researchers. Uncertainty quantification for both estimation of suicide rates and number of excluded studies is achieved through a parametric bootstrapping approach.
{"title":"Zero-Truncated Modelling in a Meta-Analysis on Suicide Data after Bariatric Surgery","authors":"Layna Charlie Dennett, Antony Overstall, Dankmar Böhning","doi":"10.1080/00031305.2025.2507380","DOIUrl":"https://doi.org/10.1080/00031305.2025.2507380","url":null,"abstract":"Meta-analysis is a well-established method for integrating results from several independent studies to estimate a common quantity of interest. However, meta-analysis is prone to selection bias, notably when particular studies are systematically excluded. This can lead to bias in estimating the quantity of interest. Motivated by a meta-analysis to estimate the rate of completed-suicide after bariatric surgery, where studies which reported no suicides were excluded, a novel zero-truncated count modeling approach was developed. This approach addresses heterogeneity, both observed and unobserved, through covariate and overdispersion modeling, respectively. Additionally, through the Horvitz-Thompson estimator, an approach is developed to estimate the number of excluded studies, a quantity of potential interest for researchers. Uncertainty quantification for both estimation of suicide rates and number of excluded studies is achieved through a parametric bootstrapping approach.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"14 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144104095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-18DOI: 10.1080/00031305.2025.2505514
Theo Economou, Daphne Parliari, Aurelio Tobias, Laura Dawkins, Hamish Steptoe, Christophe Sarran, Oliver Stoner, Rachel Lowe, Jos Lelieveld
In this tutorial we present the use of R package mgcv to implement Distributed Lag Non-Linear Models (DLNMs) in a flexible way. Interpretation of smoothing splines as random quantities enables approximate Bayesian inference, which in turn allows uncertainty quantification and comprehensive model checking. We illustrate various modeling situations using open-access epidemiological data in conjunction with simulation experiments. We demonstrate the inclusion of temporal structures and the use of mixture distributions to allow for extreme outliers. Moreover, we demonstrate interactions of the temporal lagged structures with other covariates with different lagged periods for different covariates. Spatial structures are also demonstrated, including smooth spatial variability and Markov random fields, in addition to hierarchical formulations to allow for non-structured dependency. Posterior predictive simulation is used to ensure models verify well against the data.
{"title":"Flexible distributed lag models for count data using mgcv","authors":"Theo Economou, Daphne Parliari, Aurelio Tobias, Laura Dawkins, Hamish Steptoe, Christophe Sarran, Oliver Stoner, Rachel Lowe, Jos Lelieveld","doi":"10.1080/00031305.2025.2505514","DOIUrl":"https://doi.org/10.1080/00031305.2025.2505514","url":null,"abstract":"In this tutorial we present the use of R package <span>mgcv</span> to implement Distributed Lag Non-Linear Models (DLNMs) in a flexible way. Interpretation of smoothing splines as random quantities enables approximate Bayesian inference, which in turn allows uncertainty quantification and comprehensive model checking. We illustrate various modeling situations using open-access epidemiological data in conjunction with simulation experiments. We demonstrate the inclusion of temporal structures and the use of mixture distributions to allow for extreme outliers. Moreover, we demonstrate interactions of the temporal lagged structures with other covariates with different lagged periods for different covariates. Spatial structures are also demonstrated, including smooth spatial variability and Markov random fields, in addition to hierarchical formulations to allow for non-structured dependency. Posterior predictive simulation is used to ensure models verify well against the data.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"33 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144088102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-16DOI: 10.1080/00031305.2025.2505512
Ryan S. Brill, Abraham J. Wyner
A longstanding question in the judgment and decision making literature is whether experts, even in high-stakes environments, exhibit the same cognitive biases observed in controlled experiments with inexperienced participants. Massey and Thaler (2013) claim to have found an example of bias and irrationality in expert decision making: general managers’ behavior in the National Football League draft pick trade market. They argue that general managers systematically overvalue top draft picks, which generate less surplus value on average than later first-round picks, a phenomenon known as the loser’s curse. Their conclusion hinges on the assumption that general managers should use expected surplus value as their utility function for evaluating draft picks. This assumption, however, is neither explicitly justified nor necessarily aligned with the strategic complexities of constructing a National Football League roster. In this paper, we challenge their framework by considering alternative utility functions, particularly those that emphasize the acquisition of transformational players––those capable of dramatically increasing a team’s chances of winning the Super Bowl. Under a decision rule that prioritizes the probability of acquiring elite players, which we construct from a novel Bayesian hierarchical Beta regression model, general managers’ draft trade behavior appears rational rather than systematically flawed. More broadly, our findings highlight the critical role of carefully specifying a utility function when evaluating the quality of decisions.
{"title":"The Loser’s Curse and the Critical Role of the Utility Function","authors":"Ryan S. Brill, Abraham J. Wyner","doi":"10.1080/00031305.2025.2505512","DOIUrl":"https://doi.org/10.1080/00031305.2025.2505512","url":null,"abstract":"A longstanding question in the judgment and decision making literature is whether experts, even in high-stakes environments, exhibit the same cognitive biases observed in controlled experiments with inexperienced participants. Massey and Thaler (2013) claim to have found an example of bias and irrationality in expert decision making: general managers’ behavior in the National Football League draft pick trade market. They argue that general managers systematically overvalue top draft picks, which generate less surplus value on average than later first-round picks, a phenomenon known as the loser’s curse. Their conclusion hinges on the assumption that general managers should use expected surplus value as their utility function for evaluating draft picks. This assumption, however, is neither explicitly justified nor necessarily aligned with the strategic complexities of constructing a National Football League roster. In this paper, we challenge their framework by considering alternative utility functions, particularly those that emphasize the acquisition of transformational players––those capable of dramatically increasing a team’s chances of winning the Super Bowl. Under a decision rule that prioritizes the probability of acquiring elite players, which we construct from a novel Bayesian hierarchical Beta regression model, general managers’ draft trade behavior appears rational rather than systematically flawed. More broadly, our findings highlight the critical role of carefully specifying a utility function when evaluating the quality of decisions.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"16 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144066829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-15DOI: 10.1080/00031305.2025.2505507
Haim Bar, Vladimir Pozdnyakov
In his 1996 paper, Talagrand highlighted that the Law of Large Numbers (LLN) for independent random variables can be viewed as a geometric property of multidimensional product spaces. This phenomenon is known as the concentration of measure. To illustrate this profound connection between geometry and probability theory, we consider a seemingly intractable geometric problem in multidimensional Euclidean space and solve it using standard probabilistic tools such as the LLN and the Central Limit Theorem (CLT).
{"title":"High Dimensional Space Oddity","authors":"Haim Bar, Vladimir Pozdnyakov","doi":"10.1080/00031305.2025.2505507","DOIUrl":"https://doi.org/10.1080/00031305.2025.2505507","url":null,"abstract":"In his 1996 paper, Talagrand highlighted that the Law of Large Numbers (LLN) for independent random variables can be viewed as a geometric property of multidimensional product spaces. This phenomenon is known as the concentration of measure. To illustrate this profound connection between geometry and probability theory, we consider a seemingly intractable geometric problem in multidimensional Euclidean space and solve it using standard probabilistic tools such as the LLN and the Central Limit Theorem (CLT).","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"16 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144066641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-06DOI: 10.1080/00031305.2025.2501799
Duncan K. Foley, Ellis Scharfenaker
Bayes’ theorem incorporates distinct types of information through the likelihood and prior. Direct observations of state variables enter the likelihood and modify posterior probabilities through consistent updating. Information in terms of expected values of state variables modify posterior probabilities by constraining prior probabilities to be consistent with the information. Constraints on the prior can be exact, limiting hypothetical frequency distributions to only those that satisfy the constraints, or be approximate, allowing residual deviations from the exact constraint to some degree of tolerance. When the model parameters and constraint tolerances are known, posterior probabilities follow directly from Bayes’ theorem. When parameters and tolerances are unknown a prior for them must be specified. When the system is close to statistical equilibrium the computation of posterior probabilities is simplified due to the concentration of the prior on the maximum entropy hypothesis. The relationship between maximum entropy reasoning and Bayes’ theorem from this point of view is that maximum entropy reasoning is a special case of Bayesian inference with a constrained entropy-favoring prior.
{"title":"Bayesian Inference and the Principle of Maximum Entropy","authors":"Duncan K. Foley, Ellis Scharfenaker","doi":"10.1080/00031305.2025.2501799","DOIUrl":"https://doi.org/10.1080/00031305.2025.2501799","url":null,"abstract":"Bayes’ theorem incorporates distinct types of information through the likelihood and prior. Direct observations of state variables enter the likelihood and modify posterior probabilities through consistent updating. Information in terms of expected values of state variables modify posterior probabilities by constraining prior probabilities to be consistent with the information. Constraints on the prior can be exact, limiting hypothetical frequency distributions to only those that satisfy the constraints, or be approximate, allowing residual deviations from the exact constraint to some degree of tolerance. When the model parameters and constraint tolerances are known, posterior probabilities follow directly from Bayes’ theorem. When parameters and tolerances are unknown a prior for them must be specified. When the system is close to statistical equilibrium the computation of posterior probabilities is simplified due to the concentration of the prior on the maximum entropy hypothesis. The relationship between maximum entropy reasoning and Bayes’ theorem from this point of view is that maximum entropy reasoning is a special case of Bayesian inference with a constrained entropy-favoring prior.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"4 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143933216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-05DOI: 10.1080/00031305.2025.2501800
Li-Yen R. Hu, Yulei He, Katherine E. Irimata, Vladislav Beresovsky
Chi-square tests are often employed to examine the association of categorical variables, the homogeneity of proportions between two or more samples, and the goodness-of-fit for a specified distribution. To account for the complex design of survey data, variants of chi-square tests as well as software packages that implement these tests have been developed. Nevertheless, from a survey practitioner’s perspective, there is a lack of applied literature that reviews and compares alternative options of survey chi-square tests and their associated programming and output. This paper aims to fill such a gap.Many modern statistical software packages for survey analysis are capable of computing either the Wald chi-square test or the Rao-Scott chi-square test, along with other types of chi-square tests, including the Rao-Scott likelihood ratio chi-square test and the Wald log-linear chi-square test. This paper focuses on these four types of chi-square tests, and examines four statistical packages that compute them in SAS®, R, Python and SUDAAN®. While the same type of tests using different packages yield similar results, different types of chi-square tests may yield variations in p-values when conducting the same comparison. Sample programming code is included in Appendix for readers’ reference.
{"title":"Much Ado About Survey Tables: A Comparison of Chi-Square Tests and Software to Analyze Categorical Survey Data","authors":"Li-Yen R. Hu, Yulei He, Katherine E. Irimata, Vladislav Beresovsky","doi":"10.1080/00031305.2025.2501800","DOIUrl":"https://doi.org/10.1080/00031305.2025.2501800","url":null,"abstract":"Chi-square tests are often employed to examine the association of categorical variables, the homogeneity of proportions between two or more samples, and the goodness-of-fit for a specified distribution. To account for the complex design of survey data, variants of chi-square tests as well as software packages that implement these tests have been developed. Nevertheless, from a survey practitioner’s perspective, there is a lack of applied literature that reviews and compares alternative options of survey chi-square tests and their associated programming and output. This paper aims to fill such a gap.Many modern statistical software packages for survey analysis are capable of computing either the Wald chi-square test or the Rao-Scott chi-square test, along with other types of chi-square tests, including the Rao-Scott likelihood ratio chi-square test and the Wald log-linear chi-square test. This paper focuses on these four types of chi-square tests, and examines four statistical packages that compute them in SAS®, R, Python and SUDAAN<sup>®</sup>. While the same type of tests using different packages yield similar results, different types of chi-square tests may yield variations in p-values when conducting the same comparison. Sample programming code is included in Appendix for readers’ reference.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"34 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143933218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-18DOI: 10.1080/00031305.2025.2475801
Ryan S. Brill, Ronald Yurko, Abraham J. Wyner
The standard mathematical approach to fourth-down decision-making in American football is to make the decision that maximizes estimated win probability. Win probability estimates arise from machine learning models fit from historical data. These models attempt to capture a nuanced relationship between a noisy binary outcome variable and game-state variables replete with interactions and non-linearities from a finite dataset of just a few thousand games. Thus, it is imperative to knit uncertainty quantification into the fourth-down decision procedure; we do so using bootstrapping. We find that uncertainty in the estimated optimal fourth-down decision is far greater than that currently expressed by sports analysts in popular sports media.
{"title":"Analytics, Have Some Humility: A Statistical View of Fourth-Down Decision Making","authors":"Ryan S. Brill, Ronald Yurko, Abraham J. Wyner","doi":"10.1080/00031305.2025.2475801","DOIUrl":"https://doi.org/10.1080/00031305.2025.2475801","url":null,"abstract":"The standard mathematical approach to fourth-down decision-making in American football is to make the decision that maximizes estimated win probability. Win probability estimates arise from machine learning models fit from historical data. These models attempt to capture a nuanced relationship between a noisy binary outcome variable and game-state variables replete with interactions and non-linearities from a finite dataset of just a few thousand games. Thus, it is imperative to knit uncertainty quantification into the fourth-down decision procedure; we do so using bootstrapping. We find that uncertainty in the estimated optimal fourth-down decision is far greater than that currently expressed by sports analysts in popular sports media.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"51 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143933338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1080/00031305.2025.2490786
Nathan Hawkins, Gilbert W. Fellingham, Garritt L. Page
This paper introduces a volleyball point-by-point win probability model that updates the probability of winning a set after each play in the set. The covariate informed product partition model (PPMx) is well suited to flexibly include in-set team performance information when making predictions. However, making predictions in real time would be too expensive computationally as it would require refitting the PPMx for each prediction. Instead, we develop a predictive procedure based on a single training of the PPMx that predicts in real-time. We deploy this procedure using data from the 2018 Men’s World Volleyball Championship. The procedure first trains a PPMx model using end-of-set team performance statistics from the round robin stage of the tournament. Then based on the PPMx predictive distribution, we predict the win probability after every play of every match in the knockout stages. Finally, we show how the prediction procedure can be enhanced by including pre-set information towards the beginning of the set and set score towards the end.
{"title":"Play-by-Play Volleyball Win Probability Model","authors":"Nathan Hawkins, Gilbert W. Fellingham, Garritt L. Page","doi":"10.1080/00031305.2025.2490786","DOIUrl":"https://doi.org/10.1080/00031305.2025.2490786","url":null,"abstract":"This paper introduces a volleyball point-by-point win probability model that updates the probability of winning a set after each play in the set. The covariate informed product partition model (PPMx) is well suited to flexibly include in-set team performance information when making predictions. However, making predictions in real time would be too expensive computationally as it would require refitting the PPMx for each prediction. Instead, we develop a predictive procedure based on a single training of the PPMx that predicts in real-time. We deploy this procedure using data from the 2018 Men’s World Volleyball Championship. The procedure first trains a PPMx model using end-of-set team performance statistics from the round robin stage of the tournament. Then based on the PPMx predictive distribution, we predict the win probability after every play of every match in the knockout stages. Finally, we show how the prediction procedure can be enhanced by including pre-set information towards the beginning of the set and set score towards the end.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"1 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143940081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09DOI: 10.1080/00031305.2025.2490305
Haihan Yu
Pedro J. Aphalo. Boca Raton, FL: Chapman & Hall/CRC Press, 2024, xvii + 447 pp., $220.00(H), ISBN: 978-1-032-51843-5.R programming has become an essential tool for data analysis and statistical com...
{"title":"Learn R: As a Language, 2nd ed.","authors":"Haihan Yu","doi":"10.1080/00031305.2025.2490305","DOIUrl":"https://doi.org/10.1080/00031305.2025.2490305","url":null,"abstract":"Pedro J. Aphalo. Boca Raton, FL: Chapman & Hall/CRC Press, 2024, xvii + 447 pp., $220.00(H), ISBN: 978-1-032-51843-5.R programming has become an essential tool for data analysis and statistical com...","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"2 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143933335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09DOI: 10.1080/00031305.2025.2490304
Xiao Hui Tai
Tom Alby. Boca Raton, FL: Chapman & Hall/CRC Press, 2024, xvi + 301 pp., $200.00(H), ISBN: 978-1-032-50524-4.This book is a comprehensive introduction to data science, with a focus on how it is use...
{"title":"Data Science in Practice.","authors":"Xiao Hui Tai","doi":"10.1080/00031305.2025.2490304","DOIUrl":"https://doi.org/10.1080/00031305.2025.2490304","url":null,"abstract":"Tom Alby. Boca Raton, FL: Chapman & Hall/CRC Press, 2024, xvi + 301 pp., $200.00(H), ISBN: 978-1-032-50524-4.This book is a comprehensive introduction to data science, with a focus on how it is use...","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"15 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143933333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}