Pub Date : 2025-12-05DOI: 10.1080/00031305.2025.2595972
Lily Agranat-Tamir, Kennedy D. Agwamba, Jazlyn A. Mooney, Noah A. Rosenberg
{"title":"Shared ancestors and the birthday problem","authors":"Lily Agranat-Tamir, Kennedy D. Agwamba, Jazlyn A. Mooney, Noah A. Rosenberg","doi":"10.1080/00031305.2025.2595972","DOIUrl":"https://doi.org/10.1080/00031305.2025.2595972","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"55 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1080/00031305.2025.2595980
David Winkelmann, Rouven Michels
{"title":"Momentum effects in team sports: analyzing the interplay between offense and defense in the NBA","authors":"David Winkelmann, Rouven Michels","doi":"10.1080/00031305.2025.2595980","DOIUrl":"https://doi.org/10.1080/00031305.2025.2595980","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"1 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1080/00031305.2025.2590127
Yipeng Wang, Peihua Qiu
{"title":"Forest expression of networks and their applications","authors":"Yipeng Wang, Peihua Qiu","doi":"10.1080/00031305.2025.2590127","DOIUrl":"https://doi.org/10.1080/00031305.2025.2590127","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"186 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145545795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1080/00031305.2025.2588128
Louis Davis, Boris Baeumer, Ting Wang
Most point process models for earthquakes in the literature assume that the magnitude is independent and identically distributed. This potentially hinders the ability of the model to describe the main features of data sets containing multiple earthquake mainshock aftershock sequences in succession. This study presents a novel multivariate fractional Hawkes process model designed to capture magnitude dependent triggering behaviour by incorporating history dependence into the magnitude distribution. This is done by discretising the magnitude range into disjoint intervals and modelling events with magnitude in these ranges as the subprocesses of a mutually exciting Hawkes process using the Mittag-Leffler density as the kernel function so that the point process has a history dependent mark distribution. We apply this model to two data sets, Japan and the Middle America Trench, both containing multiple mainshock aftershock sequences and compare it to the existing ETAS model by using information criteria, residual diagnostics and retrospective prediction performance. We find that for both data sets all metrics indicate that the multivariate fractional Hawkes process performs favourably against the ETAS model due to its history dependent magnitude distribution. Furthermore, we are able to infer characteristics of the data sets that cannot be inferred from the ETAS model.
{"title":"A Multivariate Fractional Hawkes Process for Multiple Earthquake Mainshock Aftershock Sequences","authors":"Louis Davis, Boris Baeumer, Ting Wang","doi":"10.1080/00031305.2025.2588128","DOIUrl":"https://doi.org/10.1080/00031305.2025.2588128","url":null,"abstract":"Most point process models for earthquakes in the literature assume that the magnitude is independent and identically distributed. This potentially hinders the ability of the model to describe the main features of data sets containing multiple earthquake mainshock aftershock sequences in succession. This study presents a novel multivariate fractional Hawkes process model designed to capture magnitude dependent triggering behaviour by incorporating history dependence into the magnitude distribution. This is done by discretising the magnitude range into disjoint intervals and modelling events with magnitude in these ranges as the subprocesses of a mutually exciting Hawkes process using the Mittag-Leffler density as the kernel function so that the point process has a history dependent mark distribution. We apply this model to two data sets, Japan and the Middle America Trench, both containing multiple mainshock aftershock sequences and compare it to the existing ETAS model by using information criteria, residual diagnostics and retrospective prediction performance. We find that for both data sets all metrics indicate that the multivariate fractional Hawkes process performs favourably against the ETAS model due to its history dependent magnitude distribution. Furthermore, we are able to infer characteristics of the data sets that cannot be inferred from the ETAS model.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"185 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1080/00031305.2025.2588131
Mathew Chandy, Elizabeth D. Schifano, Jun Yan, Xianyang Zhang
The Kolmogorov–Smirnov (KS) test is a widely used statistical test that assesses the conformity of a sample to a specified distribution. Its efficacy, however, diminishes with serially dependent data and when parameters within the hypothesized distribution are unknown. For independent data, parametric and nonparametric bootstrap procedures are available to adjust for estimated parameters. For serially dependent stationary data, parametric bootstrap has been developed with a working serial dependence structure. A counterpart for the nonparametric bootstrap approach, which needs a bias correction, has not been studied. Addressing this gap, our study introduces a bias correction method employing a nonparametric block bootstrap, which approximates the distribution of the KS statistic in assessing the goodness-of-fit of the marginal distribution of a stationary series, accounting for unspecified serial dependence and unspecified parameters. We assess its effectiveness through simulations, scrutinizing both its size and power. The practicality of our method is further illustrated with an examination of stock returns from the S&P 500 index, showcasing its utility in real-world applications.
{"title":"Nonparametric Block Bootstrap Kolmogorov-Smirnov Goodness-of-Fit Test","authors":"Mathew Chandy, Elizabeth D. Schifano, Jun Yan, Xianyang Zhang","doi":"10.1080/00031305.2025.2588131","DOIUrl":"https://doi.org/10.1080/00031305.2025.2588131","url":null,"abstract":"The Kolmogorov–Smirnov (KS) test is a widely used statistical test that assesses the conformity of a sample to a specified distribution. Its efficacy, however, diminishes with serially dependent data and when parameters within the hypothesized distribution are unknown. For independent data, parametric and nonparametric bootstrap procedures are available to adjust for estimated parameters. For serially dependent stationary data, parametric bootstrap has been developed with a working serial dependence structure. A counterpart for the nonparametric bootstrap approach, which needs a bias correction, has not been studied. Addressing this gap, our study introduces a bias correction method employing a nonparametric block bootstrap, which approximates the distribution of the KS statistic in assessing the goodness-of-fit of the marginal distribution of a stationary series, accounting for unspecified serial dependence and unspecified parameters. We assess its effectiveness through simulations, scrutinizing both its size and power. The practicality of our method is further illustrated with an examination of stock returns from the S&P 500 index, showcasing its utility in real-world applications.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-10DOI: 10.1080/00031305.2025.2571183
Flavio Chierichetti, Mirko Giacchini, Ravi Kumar
We show that the distance measure implied by the recently proposed Chatterjee coefficient of correlation can violate the triangle inequality, both in theory and in practice.
我们证明了最近提出的Chatterjee相关系数所隐含的距离度量在理论和实践上都违反三角不等式。
{"title":"On the Metricity of the Chatterjee Correlation Coefficient","authors":"Flavio Chierichetti, Mirko Giacchini, Ravi Kumar","doi":"10.1080/00031305.2025.2571183","DOIUrl":"https://doi.org/10.1080/00031305.2025.2571183","url":null,"abstract":"We show that the distance measure implied by the recently proposed Chatterjee coefficient of correlation can violate the triangle inequality, both in theory and in practice.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"10 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-08DOI: 10.1080/00031305.2025.2569464
Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman
For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.
{"title":"Bad estimation, good prediction: the Lasso in dense regimes","authors":"Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman","doi":"10.1080/00031305.2025.2569464","DOIUrl":"https://doi.org/10.1080/00031305.2025.2569464","url":null,"abstract":"For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"22 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145241309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1080/00031305.2025.2566251
Ronald Christensen
{"title":"Linear Model Estimation and Prediction for p>n","authors":"Ronald Christensen","doi":"10.1080/00031305.2025.2566251","DOIUrl":"https://doi.org/10.1080/00031305.2025.2566251","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"131 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145153780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1080/00031305.2025.2564268
Nicholas D. Edwards, Enzo de Jong, Feng Liu, Stephen T. Ferguson
Ranked data is commonly used in research across many fields of study including medicine, biology, psychology, and economics. One common statistic used for analyzing ranked data is Kendall’s τ coefficient, a non-parametric measure of rank correlation which describes the strength of the association between two monotonic continuous or ordinal variables. While the mathematics involved in calculating Kendall's τ is well-established, there are relatively few graphing methods available to visualize the results. Here, we describe several alternative and complementary visualization methods and provide an interactive app for graphing Kendall's τ. The resulting graphs provide a visualization of rank correlation which helps display the proportion of concordant and discordant pairs. Moreover, these methods highlight other key features of the data which are not represented by Kendall's τ alone but may nevertheless be meaningful, such as longer monotonic chains and the relationship between discrete pairs of observations. We demonstrate the utility of these approaches through several examples and compare our results to other visualization methods.
{"title":"Visualizing Kendall’s τ and Hidden Structures in Ranked Data","authors":"Nicholas D. Edwards, Enzo de Jong, Feng Liu, Stephen T. Ferguson","doi":"10.1080/00031305.2025.2564268","DOIUrl":"https://doi.org/10.1080/00031305.2025.2564268","url":null,"abstract":"Ranked data is commonly used in research across many fields of study including medicine, biology, psychology, and economics. One common statistic used for analyzing ranked data is Kendall’s τ coefficient, a non-parametric measure of rank correlation which describes the strength of the association between two monotonic continuous or ordinal variables. While the mathematics involved in calculating Kendall's τ is well-established, there are relatively few graphing methods available to visualize the results. Here, we describe several alternative and complementary visualization methods and provide an interactive app for graphing Kendall's τ. The resulting graphs provide a visualization of rank correlation which helps display the proportion of concordant and discordant pairs. Moreover, these methods highlight other key features of the data which are not represented by Kendall's τ alone but may nevertheless be meaningful, such as longer monotonic chains and the relationship between discrete pairs of observations. We demonstrate the utility of these approaches through several examples and compare our results to other visualization methods.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"24 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-22DOI: 10.1080/00031305.2025.2563730
Seungwoo Kang, Hee-Seok Oh
We introduce novel measures, prestige and centrality, for quantifying the prominence of each vertex in a strongly connected and directed graph by utilizing the concept of data depth (Vardi and Zhang, Proc. Natl. Acad. Sci. U.S.A. 97(4):1423–1426, 2000). The former measure quantifies the degree of prominence of each vertex in receiving choices, whereas the latter measure evaluates the degree of importance in giving choices. The proposed measures can handle graphs with both edge and vertex weights, as well as undirected graphs. However, examining a graph using a measure defined over a single ‘scale’ inevitably leads to a loss of information, as each vertex may exhibit distinct structural characteristics at different levels of locality. To this end, we further develop local versions of the proposed measures with a tunable locality parameter. Using these tools, we present a multiscale network analysis framework that provides much richer structural information about each vertex than a single-scale inspection. By applying the proposed measures to the networks constructed from the Seoul Mobility Flow Data, it is demonstrated that these measures accurately depict and uncover the inherent characteristics of individual city regions.
我们引入了新的度量,L1威望和L1中心性,通过利用L1数据深度的概念来量化强连接和有向图中每个顶点的突出性(Vardi和Zhang, Proc. Natl.)。学会科学。[j] .美国科学,1997(4):1423-1426,2000。前者量化每个顶点在接收选择中的突出程度,而后者评估给出选择的重要性程度。所提出的度量方法可以处理同时具有边权和顶点权的图,以及无向图。然而,使用在单一“尺度”上定义的度量来检查图,不可避免地会导致信息的丢失,因为每个顶点可能在不同的局部性水平上表现出不同的结构特征。为此,我们进一步开发了具有可调局部性参数的拟议度量的本地版本。使用这些工具,我们提出了一个多尺度网络分析框架,它提供了比单尺度检查更丰富的关于每个顶点的结构信息。通过将所提出的度量方法应用于基于首尔交通流量数据构建的网络,证明了这些度量方法准确地描述和揭示了单个城市区域的内在特征。
{"title":"L1\u0000 Prominence Measures for Directed Graphs","authors":"Seungwoo Kang, Hee-Seok Oh","doi":"10.1080/00031305.2025.2563730","DOIUrl":"https://doi.org/10.1080/00031305.2025.2563730","url":null,"abstract":"We introduce novel measures, <span><img alt=\"\" data-formula-source='{\"type\":\"image\",\"src\":\"/cms/asset/58477584-a277-4c04-ac5f-557269e3076b/utas_a_2563730_ilm0002.gif\"}' src=\"//:0\"/></span><span><img alt=\"\" data-formula-source='{\"type\":\"mathjax\"}' src=\"//:0\"/><math display=\"inline\"><mrow><msub><mrow><mi>L</mi></mrow><mn>1</mn></msub></mrow></math></span> prestige and <span><img alt=\"\" data-formula-source='{\"type\":\"image\",\"src\":\"/cms/asset/c93dd86e-0514-4832-8df4-280f96b64919/utas_a_2563730_ilm0003.gif\"}' src=\"//:0\"/></span><span><img alt=\"\" data-formula-source='{\"type\":\"mathjax\"}' src=\"//:0\"/><math display=\"inline\"><mrow><msub><mrow><mi>L</mi></mrow><mn>1</mn></msub></mrow></math></span> centrality, for quantifying the prominence of each vertex in a strongly connected and directed graph by utilizing the concept of <span><img alt=\"\" data-formula-source='{\"type\":\"image\",\"src\":\"/cms/asset/c144ecd8-1e24-4050-afea-05ae74cae725/utas_a_2563730_ilm0004.gif\"}' src=\"//:0\"/></span><span><img alt=\"\" data-formula-source='{\"type\":\"mathjax\"}' src=\"//:0\"/><math display=\"inline\"><mrow><msub><mrow><mi>L</mi></mrow><mn>1</mn></msub></mrow></math></span> data depth (Vardi and Zhang, Proc. Natl. Acad. Sci. U.S.A. 97(4):1423–1426, 2000). The former measure quantifies the degree of prominence of each vertex in receiving choices, whereas the latter measure evaluates the degree of importance in giving choices. The proposed measures can handle graphs with both edge and vertex weights, as well as undirected graphs. However, examining a graph using a measure defined over a single ‘scale’ inevitably leads to a loss of information, as each vertex may exhibit distinct structural characteristics at different levels of locality. To this end, we further develop local versions of the proposed measures with a tunable locality parameter. Using these tools, we present a multiscale network analysis framework that provides much richer structural information about each vertex than a single-scale inspection. By applying the proposed measures to the networks constructed from the Seoul Mobility Flow Data, it is demonstrated that these measures accurately depict and uncover the inherent characteristics of individual city regions.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"190 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}