Pub Date : 2025-12-22DOI: 10.1080/00031305.2025.2604812
Amanda K. Glazer, Layla Parast, Mevin B. Hooten
In American football, rushing, passing, and receiving yards are recorded as whole numbers, but using a unique method of rounding, even though the true yardage is continuous and recorded precisely on the field. This rounding introduces measurement error that is systematically ignored in most statistical analyses of these data. Beyond rounding, football yardage presents additional challenges: it can take on negative values and is strongly skewed. These characteristics complicate distributional assumptions and propagate rounding effects. We illustrate the consequences of these issues using data from running backs during the 2023 National Football League regular season. We show that appropriately modeling play-level yardage as a discrete, skewed, and possibly negative quantity, without access to the true values, is important to reconcile the approach with the data generation process. We compare candidate models that correctly incorporate rounding from a model checking and validation perspective. Our findings underscore the broader importance of accounting for discretization and asymmetry in sports analytics and other fields, where recorded data may mask the underlying measurement process in ways that meaningfully affect statistical conclusions.
{"title":"Beyond the Yard Line: Accommodating Rounded Sports Data in Statistical Models","authors":"Amanda K. Glazer, Layla Parast, Mevin B. Hooten","doi":"10.1080/00031305.2025.2604812","DOIUrl":"https://doi.org/10.1080/00031305.2025.2604812","url":null,"abstract":"In American football, rushing, passing, and receiving yards are recorded as whole numbers, but using a unique method of rounding, even though the true yardage is continuous and recorded precisely on the field. This rounding introduces measurement error that is systematically ignored in most statistical analyses of these data. Beyond rounding, football yardage presents additional challenges: it can take on negative values and is strongly skewed. These characteristics complicate distributional assumptions and propagate rounding effects. We illustrate the consequences of these issues using data from running backs during the 2023 National Football League regular season. We show that appropriately modeling play-level yardage as a discrete, skewed, and possibly negative quantity, without access to the true values, is important to reconcile the approach with the data generation process. We compare candidate models that correctly incorporate rounding from a model checking and validation perspective. Our findings underscore the broader importance of accounting for discretization and asymmetry in sports analytics and other fields, where recorded data may mask the underlying measurement process in ways that meaningfully affect statistical conclusions.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"20 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145801450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1080/00031305.2025.2595972
Lily Agranat-Tamir, Kennedy D. Agwamba, Jazlyn A. Mooney, Noah A. Rosenberg
{"title":"Shared ancestors and the birthday problem","authors":"Lily Agranat-Tamir, Kennedy D. Agwamba, Jazlyn A. Mooney, Noah A. Rosenberg","doi":"10.1080/00031305.2025.2595972","DOIUrl":"https://doi.org/10.1080/00031305.2025.2595972","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"55 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1080/00031305.2025.2595980
David Winkelmann, Rouven Michels
{"title":"Momentum effects in team sports: analyzing the interplay between offense and defense in the NBA","authors":"David Winkelmann, Rouven Michels","doi":"10.1080/00031305.2025.2595980","DOIUrl":"https://doi.org/10.1080/00031305.2025.2595980","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"1 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1080/00031305.2025.2590127
Yipeng Wang, Peihua Qiu
{"title":"Forest expression of networks and their applications","authors":"Yipeng Wang, Peihua Qiu","doi":"10.1080/00031305.2025.2590127","DOIUrl":"https://doi.org/10.1080/00031305.2025.2590127","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"186 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145545795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1080/00031305.2025.2588128
Louis Davis, Boris Baeumer, Ting Wang
Most point process models for earthquakes in the literature assume that the magnitude is independent and identically distributed. This potentially hinders the ability of the model to describe the main features of data sets containing multiple earthquake mainshock aftershock sequences in succession. This study presents a novel multivariate fractional Hawkes process model designed to capture magnitude dependent triggering behaviour by incorporating history dependence into the magnitude distribution. This is done by discretising the magnitude range into disjoint intervals and modelling events with magnitude in these ranges as the subprocesses of a mutually exciting Hawkes process using the Mittag-Leffler density as the kernel function so that the point process has a history dependent mark distribution. We apply this model to two data sets, Japan and the Middle America Trench, both containing multiple mainshock aftershock sequences and compare it to the existing ETAS model by using information criteria, residual diagnostics and retrospective prediction performance. We find that for both data sets all metrics indicate that the multivariate fractional Hawkes process performs favourably against the ETAS model due to its history dependent magnitude distribution. Furthermore, we are able to infer characteristics of the data sets that cannot be inferred from the ETAS model.
{"title":"A Multivariate Fractional Hawkes Process for Multiple Earthquake Mainshock Aftershock Sequences","authors":"Louis Davis, Boris Baeumer, Ting Wang","doi":"10.1080/00031305.2025.2588128","DOIUrl":"https://doi.org/10.1080/00031305.2025.2588128","url":null,"abstract":"Most point process models for earthquakes in the literature assume that the magnitude is independent and identically distributed. This potentially hinders the ability of the model to describe the main features of data sets containing multiple earthquake mainshock aftershock sequences in succession. This study presents a novel multivariate fractional Hawkes process model designed to capture magnitude dependent triggering behaviour by incorporating history dependence into the magnitude distribution. This is done by discretising the magnitude range into disjoint intervals and modelling events with magnitude in these ranges as the subprocesses of a mutually exciting Hawkes process using the Mittag-Leffler density as the kernel function so that the point process has a history dependent mark distribution. We apply this model to two data sets, Japan and the Middle America Trench, both containing multiple mainshock aftershock sequences and compare it to the existing ETAS model by using information criteria, residual diagnostics and retrospective prediction performance. We find that for both data sets all metrics indicate that the multivariate fractional Hawkes process performs favourably against the ETAS model due to its history dependent magnitude distribution. Furthermore, we are able to infer characteristics of the data sets that cannot be inferred from the ETAS model.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"185 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1080/00031305.2025.2588131
Mathew Chandy, Elizabeth D. Schifano, Jun Yan, Xianyang Zhang
The Kolmogorov–Smirnov (KS) test is a widely used statistical test that assesses the conformity of a sample to a specified distribution. Its efficacy, however, diminishes with serially dependent data and when parameters within the hypothesized distribution are unknown. For independent data, parametric and nonparametric bootstrap procedures are available to adjust for estimated parameters. For serially dependent stationary data, parametric bootstrap has been developed with a working serial dependence structure. A counterpart for the nonparametric bootstrap approach, which needs a bias correction, has not been studied. Addressing this gap, our study introduces a bias correction method employing a nonparametric block bootstrap, which approximates the distribution of the KS statistic in assessing the goodness-of-fit of the marginal distribution of a stationary series, accounting for unspecified serial dependence and unspecified parameters. We assess its effectiveness through simulations, scrutinizing both its size and power. The practicality of our method is further illustrated with an examination of stock returns from the S&P 500 index, showcasing its utility in real-world applications.
{"title":"Nonparametric Block Bootstrap Kolmogorov-Smirnov Goodness-of-Fit Test","authors":"Mathew Chandy, Elizabeth D. Schifano, Jun Yan, Xianyang Zhang","doi":"10.1080/00031305.2025.2588131","DOIUrl":"https://doi.org/10.1080/00031305.2025.2588131","url":null,"abstract":"The Kolmogorov–Smirnov (KS) test is a widely used statistical test that assesses the conformity of a sample to a specified distribution. Its efficacy, however, diminishes with serially dependent data and when parameters within the hypothesized distribution are unknown. For independent data, parametric and nonparametric bootstrap procedures are available to adjust for estimated parameters. For serially dependent stationary data, parametric bootstrap has been developed with a working serial dependence structure. A counterpart for the nonparametric bootstrap approach, which needs a bias correction, has not been studied. Addressing this gap, our study introduces a bias correction method employing a nonparametric block bootstrap, which approximates the distribution of the KS statistic in assessing the goodness-of-fit of the marginal distribution of a stationary series, accounting for unspecified serial dependence and unspecified parameters. We assess its effectiveness through simulations, scrutinizing both its size and power. The practicality of our method is further illustrated with an examination of stock returns from the S&P 500 index, showcasing its utility in real-world applications.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-10DOI: 10.1080/00031305.2025.2571183
Flavio Chierichetti, Mirko Giacchini, Ravi Kumar
We show that the distance measure implied by the recently proposed Chatterjee coefficient of correlation can violate the triangle inequality, both in theory and in practice.
我们证明了最近提出的Chatterjee相关系数所隐含的距离度量在理论和实践上都违反三角不等式。
{"title":"On the Metricity of the Chatterjee Correlation Coefficient","authors":"Flavio Chierichetti, Mirko Giacchini, Ravi Kumar","doi":"10.1080/00031305.2025.2571183","DOIUrl":"https://doi.org/10.1080/00031305.2025.2571183","url":null,"abstract":"We show that the distance measure implied by the recently proposed Chatterjee coefficient of correlation can violate the triangle inequality, both in theory and in practice.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"10 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-08DOI: 10.1080/00031305.2025.2569464
Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman
For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.
{"title":"Bad estimation, good prediction: the Lasso in dense regimes","authors":"Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman","doi":"10.1080/00031305.2025.2569464","DOIUrl":"https://doi.org/10.1080/00031305.2025.2569464","url":null,"abstract":"For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"22 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145241309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1080/00031305.2025.2566251
Ronald Christensen
{"title":"Linear Model Estimation and Prediction for p>n","authors":"Ronald Christensen","doi":"10.1080/00031305.2025.2566251","DOIUrl":"https://doi.org/10.1080/00031305.2025.2566251","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"131 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145153780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1080/00031305.2025.2564268
Nicholas D. Edwards, Enzo de Jong, Feng Liu, Stephen T. Ferguson
Ranked data is commonly used in research across many fields of study including medicine, biology, psychology, and economics. One common statistic used for analyzing ranked data is Kendall’s τ coefficient, a non-parametric measure of rank correlation which describes the strength of the association between two monotonic continuous or ordinal variables. While the mathematics involved in calculating Kendall's τ is well-established, there are relatively few graphing methods available to visualize the results. Here, we describe several alternative and complementary visualization methods and provide an interactive app for graphing Kendall's τ. The resulting graphs provide a visualization of rank correlation which helps display the proportion of concordant and discordant pairs. Moreover, these methods highlight other key features of the data which are not represented by Kendall's τ alone but may nevertheless be meaningful, such as longer monotonic chains and the relationship between discrete pairs of observations. We demonstrate the utility of these approaches through several examples and compare our results to other visualization methods.
{"title":"Visualizing Kendall’s τ and Hidden Structures in Ranked Data","authors":"Nicholas D. Edwards, Enzo de Jong, Feng Liu, Stephen T. Ferguson","doi":"10.1080/00031305.2025.2564268","DOIUrl":"https://doi.org/10.1080/00031305.2025.2564268","url":null,"abstract":"Ranked data is commonly used in research across many fields of study including medicine, biology, psychology, and economics. One common statistic used for analyzing ranked data is Kendall’s τ coefficient, a non-parametric measure of rank correlation which describes the strength of the association between two monotonic continuous or ordinal variables. While the mathematics involved in calculating Kendall's τ is well-established, there are relatively few graphing methods available to visualize the results. Here, we describe several alternative and complementary visualization methods and provide an interactive app for graphing Kendall's τ. The resulting graphs provide a visualization of rank correlation which helps display the proportion of concordant and discordant pairs. Moreover, these methods highlight other key features of the data which are not represented by Kendall's τ alone but may nevertheless be meaningful, such as longer monotonic chains and the relationship between discrete pairs of observations. We demonstrate the utility of these approaches through several examples and compare our results to other visualization methods.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"24 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}