SUMMARY In network analysis, we frequently need to conduct inference for network parameters based on one observed network. Since the sampling distribution of the statistic is often unknown, we need to rely on the bootstrap. However, due to the complex dependence structure among vertices, existing bootstrap methods often yield unsatisfactory performance, especially under small or moderate sample sizes. To this end, we propose a new network bootstrap procedure, termed local bootstrap, to estimate the standard errors of network statistics. We propose to resample the observed vertices along with their neighbor sets, and reconstruct the edges between the resampled vertices by drawing from the set of edges connecting their neighbor sets. We justify the proposed method theoretically with desirable asymptotic properties for statistics such as motif density, and demonstrate its excellent numerical performance in small and moderate sample sizes. Our method includes several existing methods, such as the empirical graphon bootstrap, as special cases. We investigate the advantages of the proposed methods over the existing methods through the lens of edge randomness, vertex heterogeneity, neighbor set size, which shed some light on the complex issue of network bootstrapping.
{"title":"Local Bootstrap for Network Data","authors":"Tianhai Zu, Yichen Qin","doi":"10.1093/biomet/asae046","DOIUrl":"https://doi.org/10.1093/biomet/asae046","url":null,"abstract":"SUMMARY In network analysis, we frequently need to conduct inference for network parameters based on one observed network. Since the sampling distribution of the statistic is often unknown, we need to rely on the bootstrap. However, due to the complex dependence structure among vertices, existing bootstrap methods often yield unsatisfactory performance, especially under small or moderate sample sizes. To this end, we propose a new network bootstrap procedure, termed local bootstrap, to estimate the standard errors of network statistics. We propose to resample the observed vertices along with their neighbor sets, and reconstruct the edges between the resampled vertices by drawing from the set of edges connecting their neighbor sets. We justify the proposed method theoretically with desirable asymptotic properties for statistics such as motif density, and demonstrate its excellent numerical performance in small and moderate sample sizes. Our method includes several existing methods, such as the empirical graphon bootstrap, as special cases. We investigate the advantages of the proposed methods over the existing methods through the lens of edge randomness, vertex heterogeneity, neighbor set size, which shed some light on the complex issue of network bootstrapping.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SUMMARY We prove that an m out of n bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that m out of n bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.
摘要 我们证明,只要能确定查特吉秩相关性的渐近正态性,则查特吉秩相关性的 n 分之 m 引导程序是一致的。特别是,我们证明了 n 分之 m 引导法既适用于连续数据,也适用于具有独立坐标的离散数据;此外,模拟结果表明,它对具有从属坐标的离散数据也有良好的表现,并且优于其他估计方法。在科尔莫哥洛夫距离和瓦瑟斯坦距离中都证明了引导法的一致性。
{"title":"A Simple Bootstrap for Chatterjee's Rank Correlation","authors":"H Dette, M Kroll","doi":"10.1093/biomet/asae045","DOIUrl":"https://doi.org/10.1093/biomet/asae045","url":null,"abstract":"SUMMARY We prove that an m out of n bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that m out of n bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Consider sensitivity analysis for causal inference in a longitudinal study with time-varying treatments and covariates. It is of interest to assess the worst-case possible values of counterfactual-outcome means and average treatment effects under sequential unmeasured confounding. We formulate several multi-period sensitivity models to relax the corresponding versions of the assumption of sequential non-confounding. The primary sensitivity model involves only counterfactual outcomes, whereas the joint and product sensitivity models involve both counterfactual covariates and outcomes. We establish and compare explicit representations for the sharp and conservative bounds at the population level through convex optimization, depending only on the observed data. These results provide for the first time a satisfactory generalization from the marginal sensitivity model in the cross-sectional setting.
{"title":"Sensitivity models and bounds under sequential unmeasured confounding in longitudinal studies","authors":"Zhiqiang Tan","doi":"10.1093/biomet/asae044","DOIUrl":"https://doi.org/10.1093/biomet/asae044","url":null,"abstract":"Consider sensitivity analysis for causal inference in a longitudinal study with time-varying treatments and covariates. It is of interest to assess the worst-case possible values of counterfactual-outcome means and average treatment effects under sequential unmeasured confounding. We formulate several multi-period sensitivity models to relax the corresponding versions of the assumption of sequential non-confounding. The primary sensitivity model involves only counterfactual outcomes, whereas the joint and product sensitivity models involve both counterfactual covariates and outcomes. We establish and compare explicit representations for the sharp and conservative bounds at the population level through convex optimization, depending only on the observed data. These results provide for the first time a satisfactory generalization from the marginal sensitivity model in the cross-sectional setting.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statisticians and epidemiologists generally cite the publications by Prentice & Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.
{"title":"Studies in the history of probability and statistics, LI: the first conditional logistic regression","authors":"J A Hanley","doi":"10.1093/biomet/asae038","DOIUrl":"https://doi.org/10.1093/biomet/asae038","url":null,"abstract":"Statisticians and epidemiologists generally cite the publications by Prentice & Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary One of the most important problems in precision medicine is to find the optimal individualized treatment rule, which is designed to recommend treatment decisions and maximize overall clinical benefit to patients based on their individual characteristics. Typically, the expected clinical outcome is required to be estimated first, in which an outcome regression model or a propensity score model usually needs to be assumed for most of the existing statistical methods. However, if either model assumption is invalid, the estimated treatment regime is not reliable. In this article, we first define a contrast value function, which is the basis of the study for individualized treatment regimes. Then we construct a hybrid estimator of the contrast value function, by combining two types of estimation methods. We further propose a robust covariate-balancing estimator of the contrast value function by combining the inverse probability weighted method and matching method, which is based on the covariate balancing propensity score proposed by Imai and Ratkovic (2014). Theoretical results show that the proposed estimator is doubly robust, that is, it is consistent if either the propensity score model or the matching is correct. Based on a large number of simulation studies, we demonstrate that the proposed estimator outperforms existing methods. Lastly, the proposed method is illustrated through analysis of the SUPPORT study.
摘要 精准医疗中最重要的问题之一是找到最佳个体化治疗规则,该规则旨在根据患者的个体特征推荐治疗决策,并使患者的总体临床获益最大化。通常情况下,首先需要估计预期临床结果,在此过程中,大多数现有统计方法通常需要假设结果回归模型或倾向评分模型。然而,如果任一模型假设无效,估计出的治疗方案就不可靠。在本文中,我们首先定义了对比值函数,这是研究个体化治疗方案的基础。然后,我们结合两种估计方法,构建了对比值函数的混合估计器。我们进一步结合反概率加权法和匹配法,在 Imai 和 Ratkovic(2014 年)提出的共变平衡倾向得分的基础上,提出了一种稳健的共变平衡对比值函数估计器。理论结果表明,所提出的估计器具有双重稳健性,即如果倾向得分模型或匹配正确,则估计器是一致的。基于大量的模拟研究,我们证明了所提出的估计方法优于现有方法。最后,我们通过对 SUPPORT 研究的分析来说明所提出的方法。
{"title":"Robust Covariate-Balancing Method in Learning Optimal Individualized Treatment Regimes","authors":"Canhui Li, Donglin Zeng, Wensheng Zhu","doi":"10.1093/biomet/asae036","DOIUrl":"https://doi.org/10.1093/biomet/asae036","url":null,"abstract":"Summary One of the most important problems in precision medicine is to find the optimal individualized treatment rule, which is designed to recommend treatment decisions and maximize overall clinical benefit to patients based on their individual characteristics. Typically, the expected clinical outcome is required to be estimated first, in which an outcome regression model or a propensity score model usually needs to be assumed for most of the existing statistical methods. However, if either model assumption is invalid, the estimated treatment regime is not reliable. In this article, we first define a contrast value function, which is the basis of the study for individualized treatment regimes. Then we construct a hybrid estimator of the contrast value function, by combining two types of estimation methods. We further propose a robust covariate-balancing estimator of the contrast value function by combining the inverse probability weighted method and matching method, which is based on the covariate balancing propensity score proposed by Imai and Ratkovic (2014). Theoretical results show that the proposed estimator is doubly robust, that is, it is consistent if either the propensity score model or the matching is correct. Based on a large number of simulation studies, we demonstrate that the proposed estimator outperforms existing methods. Lastly, the proposed method is illustrated through analysis of the SUPPORT study.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"337 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AmirEmad Ghassami, Alan Yang, Ilya Shpitser, Eric Tchetgen Tchetgen
Summary Proximal causal inference was recently proposed as a framework to identify causal effects from observational data in the presence of hidden confounders for which proxies are available. In this paper, we extend the proximal causal inference approach to settings where identification of causal effects hinges upon a set of mediators which are not observed, yet error prone proxies of the hidden mediators are measured. Specifically, (i) we establish causal hidden mediation analysis, which extends classical causal mediation analysis methods for identifying natural direct and indirect effects under no unmeasured confounding to a setting where the mediator of interest is hidden, but proxies of it are available. (ii) We establish a hidden front-door criterion, which extends the classical front-door criterion to allow for hidden mediators for which proxies are available. (iii) We show that the identification of a certain causal effect called population intervention indirect effect remains possible with hidden mediators in settings where challenges in (i) and (ii) might co-exist. We view (i)-(iii) as important steps towards the practical application of front-door criteria and mediation analysis as mediators are almost always measured with error and thus, the most one can hope for in practice is that the measurements are at best proxies of mediating mechanisms. We propose identification approaches for the parameters of interest in our considered models. For the estimation aspect, we propose an influence function-based estimation method and provide an analysis for the robustness of the estimators.
{"title":"Causal inference with hidden mediators","authors":"AmirEmad Ghassami, Alan Yang, Ilya Shpitser, Eric Tchetgen Tchetgen","doi":"10.1093/biomet/asae037","DOIUrl":"https://doi.org/10.1093/biomet/asae037","url":null,"abstract":"Summary Proximal causal inference was recently proposed as a framework to identify causal effects from observational data in the presence of hidden confounders for which proxies are available. In this paper, we extend the proximal causal inference approach to settings where identification of causal effects hinges upon a set of mediators which are not observed, yet error prone proxies of the hidden mediators are measured. Specifically, (i) we establish causal hidden mediation analysis, which extends classical causal mediation analysis methods for identifying natural direct and indirect effects under no unmeasured confounding to a setting where the mediator of interest is hidden, but proxies of it are available. (ii) We establish a hidden front-door criterion, which extends the classical front-door criterion to allow for hidden mediators for which proxies are available. (iii) We show that the identification of a certain causal effect called population intervention indirect effect remains possible with hidden mediators in settings where challenges in (i) and (ii) might co-exist. We view (i)-(iii) as important steps towards the practical application of front-door criteria and mediation analysis as mediators are almost always measured with error and thus, the most one can hope for in practice is that the measurements are at best proxies of mediating mechanisms. We propose identification approaches for the parameters of interest in our considered models. For the estimation aspect, we propose an influence function-based estimation method and provide an analysis for the robustness of the estimators.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"249 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary It is conventionally believed that permutation-based testing methods should ideally use all permutations. We challenge this by showing we can sometimes obtain dramatically more power by using a tiny subgroup. As the subgroup is tiny, this also comes at a much lower computational cost. Moreover, the method remains valid for the same hypotheses. We exploit this to improve the popular permutation-based Westfall & Young MaxT multiple testing method. We analyze the relative efficiency in a Gaussian location model, and find the largest gain in high dimensions.
摘要 传统观点认为,基于排列的检验方法最好使用所有排列。我们对这一观点提出了质疑,因为我们发现有时使用一个很小的子群就能获得更强的能力。由于子群很小,因此计算成本也低得多。此外,这种方法对相同的假设依然有效。我们利用这一点改进了流行的基于置换的 Westfall & Young MaxT 多重检验方法。我们分析了高斯位置模型中的相对效率,发现在高维度中的收益最大。
{"title":"More Power by Using Fewer Permutations","authors":"Nick W Koning","doi":"10.1093/biomet/asae031","DOIUrl":"https://doi.org/10.1093/biomet/asae031","url":null,"abstract":"Summary It is conventionally believed that permutation-based testing methods should ideally use all permutations. We challenge this by showing we can sometimes obtain dramatically more power by using a tiny subgroup. As the subgroup is tiny, this also comes at a much lower computational cost. Moreover, the method remains valid for the same hypotheses. We exploit this to improve the popular permutation-based Westfall & Young MaxT multiple testing method. We analyze the relative efficiency in a Gaussian location model, and find the largest gain in high dimensions.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"377 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141585906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary With the advance of science and technology, more and more data are collected in the form of functions. A fundamental question for a pair of random functions is to test whether they are independent. This problem becomes quite challenging when the random trajectories are sampled irregularly and sparsely for each subject. In other words, each random function is only sampled at a few time-points, and these time-points vary with subjects. Furthermore, the observed data may contain noise. To the best of our knowledge, there exists no consistent test in the literature to test the independence of sparsely observed functional data. We show in this work that testing pointwise independence simultaneously is feasible. The test statistics are constructed by integrating pointwise distance covariances (Székely et al., 2007) and are shown to converge, at a certain rate, to their corresponding population counterparts, which characterize the simultaneous pointwise independence of two random functions. The performance of the proposed methods is further verified by Monte Carlo simulations and analysis of real data.
摘要 随着科学技术的发展,越来越多的数据以函数的形式被收集起来。一对随机函数的基本问题是测试它们是否独立。如果对每个受试者的随机轨迹进行不规则的稀疏采样,这个问题就变得相当具有挑战性。换句话说,每个随机函数只在几个时间点上采样,而这些时间点会随着受试者的不同而变化。此外,观察到的数据可能包含噪声。据我们所知,文献中没有一致的测试方法来测试稀疏观测功能数据的独立性。我们在这项工作中证明,同时测试点独立性是可行的。测试统计量是通过积分点距协方差(Székely et al.蒙特卡罗模拟和真实数据分析进一步验证了所提方法的性能。
{"title":"Testing Independence for Sparse Longitudinal Data","authors":"Changbo Zhu, Junwen Yao, Jane-Ling Wang","doi":"10.1093/biomet/asae035","DOIUrl":"https://doi.org/10.1093/biomet/asae035","url":null,"abstract":"Summary With the advance of science and technology, more and more data are collected in the form of functions. A fundamental question for a pair of random functions is to test whether they are independent. This problem becomes quite challenging when the random trajectories are sampled irregularly and sparsely for each subject. In other words, each random function is only sampled at a few time-points, and these time-points vary with subjects. Furthermore, the observed data may contain noise. To the best of our knowledge, there exists no consistent test in the literature to test the independence of sparsely observed functional data. We show in this work that testing pointwise independence simultaneously is feasible. The test statistics are constructed by integrating pointwise distance covariances (Székely et al., 2007) and are shown to converge, at a certain rate, to their corresponding population counterparts, which characterize the simultaneous pointwise independence of two random functions. The performance of the proposed methods is further verified by Monte Carlo simulations and analysis of real data.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"18 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary We explore how much knowing a parametric restriction on propensity scores improves semiparametric efficiency bounds in the potential outcome framework. For stratified propensity scores, considered as a parametric model, we derive explicit formulas for the efficiency gain from knowing how the covariate space is split. Based on these, we find that the efficiency gain decreases as the partition of the stratification becomes finer. For general parametric models, where it is hard to obtain explicit representations of efficiency bounds, we propose a novel framework that enables us to see whether knowing a parametric model is valuable in terms of efficiency even when it is high-dimensional. In addition to the intuitive fact that knowing the parametric model does not help much if it is sufficiently flexible, we discover that the efficiency gain can be nearly zero even though the parametric assumption significantly restricts the space of possible propensity scores.
{"title":"Semiparametric efficiency gains from parametric restrictions on propensity scores","authors":"Haruki Kono","doi":"10.1093/biomet/asae034","DOIUrl":"https://doi.org/10.1093/biomet/asae034","url":null,"abstract":"Summary We explore how much knowing a parametric restriction on propensity scores improves semiparametric efficiency bounds in the potential outcome framework. For stratified propensity scores, considered as a parametric model, we derive explicit formulas for the efficiency gain from knowing how the covariate space is split. Based on these, we find that the efficiency gain decreases as the partition of the stratification becomes finer. For general parametric models, where it is hard to obtain explicit representations of efficiency bounds, we propose a novel framework that enables us to see whether knowing a parametric model is valuable in terms of efficiency even when it is high-dimensional. In addition to the intuitive fact that knowing the parametric model does not help much if it is sufficiently flexible, we discover that the efficiency gain can be nearly zero even though the parametric assumption significantly restricts the space of possible propensity scores.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"22 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lachlan C Astfalck, Adam M Sykulski, Edward J Cripps
Summary Welch’s method provides an estimator of the power spectral density that is statistically consistent. This is achieved by averaging over periodograms calculated from overlapping segments of a time series. For a finite length time series, while the variance of the estimator decreases as the number of segments increase, the magnitude of the estimator’s bias increases: a bias-variance trade-off ensues when setting the segment number. We address this issue by providing a novel method for debiasing Welch’s method which maintains the computational complexity and asymptotic consistency, and leads to improved finite-sample performance. Theoretical results are given for fourth-order stationary processes with finite fourth-order moments and absolutely convergent fourth-order cumulant function. The significant bias reduction is demonstrated with numerical simulation and an application to real-world data. Our estimator also permits irregular spacing over frequency and we demonstrate how this may be employed for signal compression and further variance reduction. Code accompanying this work is available in R and python.
摘要 韦尔奇方法提供了一种统计上一致的功率谱密度估算器。这是通过对时间序列的重叠片段计算出的周期图求取平均值来实现的。对于有限长度的时间序列,虽然估计器的方差会随着分段数的增加而减小,但估计器的偏差幅度却会增大:在设定分段数时,偏差与方差之间会产生权衡。为了解决这个问题,我们提供了一种新的韦尔奇去偏方法,这种方法既能保持计算复杂性和渐进一致性,又能改善有限样本性能。该方法给出了具有有限四阶矩和绝对收敛四阶累积函数的四阶静止过程的理论结果。通过数值模拟和实际数据的应用,证明了偏差的显著减少。我们的估计器还允许频率上的不规则间隔,并演示了如何将其用于信号压缩和进一步降低方差。本研究的相关代码采用 R 和 python 语言编写。
{"title":"Debiasing Welch’s Method for Spectral Density Estimation","authors":"Lachlan C Astfalck, Adam M Sykulski, Edward J Cripps","doi":"10.1093/biomet/asae033","DOIUrl":"https://doi.org/10.1093/biomet/asae033","url":null,"abstract":"Summary Welch’s method provides an estimator of the power spectral density that is statistically consistent. This is achieved by averaging over periodograms calculated from overlapping segments of a time series. For a finite length time series, while the variance of the estimator decreases as the number of segments increase, the magnitude of the estimator’s bias increases: a bias-variance trade-off ensues when setting the segment number. We address this issue by providing a novel method for debiasing Welch’s method which maintains the computational complexity and asymptotic consistency, and leads to improved finite-sample performance. Theoretical results are given for fourth-order stationary processes with finite fourth-order moments and absolutely convergent fourth-order cumulant function. The significant bias reduction is demonstrated with numerical simulation and an application to real-world data. Our estimator also permits irregular spacing over frequency and we demonstrate how this may be employed for signal compression and further variance reduction. Code accompanying this work is available in R and python.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"7 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}