Annals of Statistics最新文献

英文中文

ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS. 高维度遗传数据预测的基于 blockwise 和参考面板的估计器。

IF 3.2 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2024-06-01 Epub Date: 2024-08-11 DOI: 10.1214/24-aos2378

Bingxin Zhao, Shurong Zheng, Hongtu Zhu

Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.

基因预测为将基因发现转化为医学进步带来了巨大希望。由于遗传变异的高维协方差矩阵（或称连锁不平衡（LD）模式）通常呈现块对角结构，因此许多方法都会考虑预定局部 LD 块中变异体之间的依赖性。此外，出于隐私和数据保护的考虑，每个 LD 块中的遗传变异依赖性通常是通过外部参考面板而不是原始训练数据集估算的。本文提出了在无稀疏性限制的高维预测框架下，对基于顺时针方向和参考面板的估计方法进行统一分析。我们发现，令人惊讶的是，即使协方差矩阵具有边界明确的块对角结构，调整局部依赖性的顺时针估计方法的准确性也会大大低于控制整个协方差矩阵的方法。此外，建立在原始训练数据集和外部参考面板基础上的估算方法在高维度上可能会有不同的表现，这可能反映了只能从训练数据集中获取摘要级数据的代价。这一分析基于随机矩阵理论中块对角协方差矩阵的新结果。我们利用大量模拟和英国生物库的真实数据分析对结果进行了数值评估。

{"title":"ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS.","authors":"Bingxin Zhao, Shurong Zheng, Hongtu Zhu","doi":"10.1214/24-aos2378","DOIUrl":"10.1214/24-aos2378","url":null,"abstract":"Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 3","pages":"948-965"},"PeriodicalIF":3.2,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11391480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142279682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Minimax rates for heterogeneous causal effect estimation.

IF 3.2 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2024-04-01 Epub Date: 2024-05-09 DOI: 10.1214/24-aos2369

Edward H Kennedy, Sivaraman Balakrishnan, James M Robins, Larry Wasserman

Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be developed, with the minimax rate of convergence and construction of rate-optimal estimators remaining open problems. In this paper we derive the minimax rate for CATE estimation, in a Hölder-smooth nonparametric model, and present a new local polynomial estimator, giving high-level conditions under which it is minimax optimal. Our minimax lower bound is derived via a localized version of the method of fuzzy hypotheses, combining lower bound constructions for nonparametric regression and functional estimation. Our proposed estimator can be viewed as a local polynomial R-Learner, based on a localized modification of higher-order influence function methods. The minimax rate we find exhibits several interesting features, including a non-standard elbow phenomenon and an unusual interpolation between nonparametric regression and functional estimation rates. The latter quantifies how the CATE, as an estimand, can be viewed as a regression/functional hybrid.

估算异质性因果效应--即政策和治疗方法的效应如何在不同受试者之间发生变化--是因果推断中的一项基本任务。近年来，人们提出了许多估计条件平均治疗效果（CATE）的方法，但围绕最优性的问题在很大程度上仍未得到解答。特别是，关于最优性的最小理论尚待发展，最小收敛率和最优率估计器的构建仍是悬而未决的问题。在本文中，我们在一个荷尔德平滑非参数模型中推导出了 CATE 估计的最小率，并提出了一个新的局部多项式估计器，给出了它是最小最优估计器的高级条件。我们的最小值下界是通过模糊假设方法的本地化版本推导出来的，结合了非参数回归和函数估计的下界构造。我们提出的估计器可以看作是基于高阶影响函数方法局部修正的局部多项式 R 学习器。我们发现的最小率具有几个有趣的特征，包括非标准的肘部现象和非参数回归与函数估计率之间不寻常的插值。后者量化了作为估算对象的 CATE 如何被视为回归/函数混合体。

{"title":"Minimax rates for heterogeneous causal effect estimation.","authors":"Edward H Kennedy, Sivaraman Balakrishnan, James M Robins, Larry Wasserman","doi":"10.1214/24-aos2369","DOIUrl":"10.1214/24-aos2369","url":null,"abstract":"Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be developed, with the minimax rate of convergence and construction of rate-optimal estimators remaining open problems. In this paper we derive the minimax rate for CATE estimation, in a Hölder-smooth nonparametric model, and present a new local polynomial estimator, giving high-level conditions under which it is minimax optimal. Our minimax lower bound is derived via a localized version of the method of fuzzy hypotheses, combining lower bound constructions for nonparametric regression and functional estimation. Our proposed estimator can be viewed as a local polynomial R-Learner, based on a localized modification of higher-order influence function methods. The minimax rate we find exhibits several interesting features, including a non-standard elbow phenomenon and an unusual interpolation between nonparametric regression and functional estimation rates. The latter quantifies how the CATE, as an estimand, can be viewed as a regression/functional hybrid.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 2","pages":"793-816"},"PeriodicalIF":3.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11960818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143762600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RANK-BASED INDICES FOR TESTING INDEPENDENCE BETWEEN TWO HIGH-DIMENSIONAL VECTORS. 基于秩的指数，用于测试两个高维向量之间的独立性。

IF 3.2 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2024-02-01 Epub Date: 2024-03-07 DOI: 10.1214/23-aos2339

Yeqing Zhou, Kai Xu, Liping Zhu, Runze Li

To test independence between two high-dimensional random vectors, we propose three tests based on the rank-based indices derived from Hoeffding's $D$ , Blum-Kiefer-Rosenblatt's $R$ and Bergsma-Dassios-Yanagimoto's $τ^{*}$ . Under the null hypothesis of independence, we show that the distributions of the proposed test statistics converge to normal ones if the dimensions diverge arbitrarily with the sample size. We further derive an explicit rate of convergence. Thanks to the monotone transformation-invariant property, these distribution-free tests can be readily used to generally distributed random vectors including heavily tailed ones. We further study the local power of the proposed tests and compare their relative efficiencies with two classic distance covariance/correlation based tests in high dimensional settings. We establish explicit relationships between $D, R, τ^{*}$ and Pearson's correlation for bivariate normal random variables. The relationships serve as a basis for power comparison. Our theoretical results show that under a Gaussian equicorrelation alternative, (i) the proposed tests are superior to the two classic distance covariance/correlation based tests if the components of random vectors have very different scales; (ii) the asymptotic efficiency of the proposed tests based on $D, τ^{*}$ and $R$ are sorted in a descending order.

为了检验两个高维随机向量之间的独立性，我们提出了三种检验方法，分别基于从霍夫丁的 D、布卢姆-基弗-罗森布拉特的 R 和贝格斯马-达西奥斯-扬纳基莫托的τ* 得出的基于秩的指数。在独立性的零假设下，我们证明了如果维数随样本量任意发散，所提出的检验统计量的分布会收敛到正态分布。我们进一步推导出了明确的收敛率。得益于单调变换不变的特性，这些无分布检验可以很容易地用于一般分布的随机向量，包括重尾向量。我们进一步研究了所提出检验的局部功率，并比较了它们与两种基于距离协方差/相关性的经典检验在高维环境下的相对效率。我们在双变量正态随机变量的 D、R、τ* 和皮尔逊相关性之间建立了明确的关系。这些关系可作为功率比较的基础。我们的理论结果表明，在高斯等相关性替代条件下，(i) 如果随机向量的分量具有非常不同的尺度，所提出的检验优于基于距离协方差/相关性的两种经典检验；(ii) 基于 D、τ* 和 R 所提出的检验的渐进效率按降序排列。

{"title":"RANK-BASED INDICES FOR TESTING INDEPENDENCE BETWEEN TWO HIGH-DIMENSIONAL VECTORS.","authors":"Yeqing Zhou, Kai Xu, Liping Zhu, Runze Li","doi":"10.1214/23-aos2339","DOIUrl":"10.1214/23-aos2339","url":null,"abstract":"To test independence between two high-dimensional random vectors, we propose three tests based on the rank-based indices derived from Hoeffding's <math><mi>D</mi></math>, Blum-Kiefer-Rosenblatt's <math><mi>R</mi></math> and Bergsma-Dassios-Yanagimoto's <math><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math>. Under the null hypothesis of independence, we show that the distributions of the proposed test statistics converge to normal ones if the dimensions diverge arbitrarily with the sample size. We further derive an explicit rate of convergence. Thanks to the monotone transformation-invariant property, these distribution-free tests can be readily used to generally distributed random vectors including heavily tailed ones. We further study the local power of the proposed tests and compare their relative efficiencies with two classic distance covariance/correlation based tests in high dimensional settings. We establish explicit relationships between <math><mi>D</mi><mo>,</mo><mi>R</mi><mo>,</mo><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math> and Pearson's correlation for bivariate normal random variables. The relationships serve as a basis for power comparison. Our theoretical results show that under a Gaussian equicorrelation alternative, (i) the proposed tests are superior to the two classic distance covariance/correlation based tests if the components of random vectors have very different scales; (ii) the asymptotic efficiency of the proposed tests based on <math><mi>D</mi><mo>,</mo><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math> and <math><mi>R</mi></math> are sorted in a descending order.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 1","pages":"184-206"},"PeriodicalIF":3.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11064990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140849012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Order-of-addition orthogonal arrays to study the effect of treatment ordering 研究正交加法排序对处理排序的影响

1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2023-08-01 DOI: 10.1214/23-aos2317

Eric D. Schoen, Robert W. Mee

The effect of the order in which a set of m treatments is applied can be modeled by relative-position factors that indicate whether treatment i is carried out before or after treatment j, or by the absolute position for treatment i in the sequence. A design with the same normalized information matrix as the design with all m! sequences is D- and G-optimal for the main-effects model involving the relative-position factors. We prove that such designs are also I-optimal for this model and D-optimal as well as G- and I-optimal for the first-order model in the absolute-position factors. We propose a methodology for a complete or partial enumeration of nonequivalent designs that are optimal for both models.

一组m个处理顺序的影响可以通过指示处理i是在处理j之前还是之后进行的相对位置因素来建模，或者通过处理i在序列中的绝对位置来建模。一个具有相同归一化信息矩阵的设计与所有m!对于包含相对位置因子的主效应模型，序列是D-和g -最优的。我们证明了这种设计对于该模型也是i -最优的，对于一阶模型在绝对位置因子上也是d -最优的，G-最优的，i -最优的。我们提出了一种方法，用于完全或部分列举非等效设计，这两种模型都是最佳的。

引用次数: 0

Matching recovery threshold for correlated random graphs 相关随机图的匹配恢复阈值

1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2023-08-01 DOI: 10.1214/23-aos2305

Jian Ding, Hang Du

For two correlated graphs which are independently sub-sampled from a common Erdős–Rényi graph G(n,p), we wish to recover their latent vertex matching from the observation of these two graphs without labels. When p=n−α+o(1) for α∈(0,1], we establish a sharp information-theoretic threshold for whether it is possible to correctly match a positive fraction of vertices. Our result sharpens a constant factor in a recent work by Wu, Xu and Yu.

对于从一个公共Erdős-Rényi图G(n,p)中独立子采样的两个相关图，我们希望从这两个没有标记的图的观察中恢复它们的潜在顶点匹配。对于α∈(0,1)，当p=n−α+o(1)时，我们建立了一个尖锐的信息论阈值，以确定是否有可能正确匹配顶点的正分数。我们的结果强化了Wu, Xu和Yu最近工作中的一个常数因素。

引用次数: 13

Statistical inference on a changing extreme value dependence structure 变化极值依赖结构的统计推断

1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2023-08-01 DOI: 10.1214/23-aos2314

Holger Drees

We analyze the extreme value dependence of independent, not necessarily identically distributed multivariate regularly varying random vectors. More specifically, we propose estimators of the spectral measure locally at some time point and of the spectral measures integrated over time. The uniform asymptotic normality of these estimators is proved under suitable nonparametric smoothness and regularity assumptions. We then use the process convergence of the integrated spectral measure to devise consistent tests for the null hypothesis that the spectral measure does not change over time.

我们分析了独立的、不一定同分布的多变量正则变随机向量的极值相关性。更具体地说，我们提出了在某个时间点的局部光谱测度和随时间集成的光谱测度的估计。在适当的非参数光滑性和正则性假设下，证明了这些估计量的一致渐近正态性。然后，我们使用集成光谱测量的过程收敛性来设计零假设的一致检验，即光谱测量不随时间变化。

引用次数: 2

Post-selection inference via algorithmic stability 通过算法稳定性进行后选择推理

1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2023-08-01 DOI: 10.1214/23-aos2303

Tijana Zrnic, Michael I. Jordan

When the target of statistical inference is chosen in a data-driven manner, the guarantees provided by classical theories vanish. We propose a solution to the problem of inference after selection by building on the framework of algorithmic stability, in particular its branch with origins in the field of differential privacy. Stability is achieved via randomization of selection and it serves as a quantitative measure that is sufficient to obtain nontrivial post-selection corrections for classical confidence intervals. Importantly, the underpinnings of algorithmic stability translate directly into computational efficiency—our method computes simple corrections for selective inference without recourse to Markov chain Monte Carlo sampling.

当以数据驱动的方式选择统计推断的目标时，经典理论提供的保证就消失了。我们提出了一种基于算法稳定性框架的选择后推理问题的解决方案，特别是其起源于差分隐私领域的分支。稳定性是通过选择的随机化实现的，它作为一种定量测量，足以获得经典置信区间的非平凡选择后校正。重要的是，算法稳定性的基础直接转化为计算效率——我们的方法计算选择性推理的简单修正，而不依赖于马尔可夫链蒙特卡罗采样。

引用次数: 2

Bridging factor and sparse models 桥接因子与稀疏模型

1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2023-08-01 DOI: 10.1214/23-aos2304

Jianqing Fan, Ricardo Masini, Marcelo C. Medeiros

Factor and sparse models are widely used to impose a low-dimensional structure in high-dimensions. However, they are seemingly mutually exclusive. We propose a lifting method that combines the merits of these two models in a supervised learning methodology that allows for efficiently exploring all the information in high-dimensional datasets. The method is based on a flexible model for high-dimensional panel data with observable and/or latent common factors and idiosyncratic components. The model is called the factor-augmented regression model. It includes principal components and sparse regression as specific models, significantly weakens the cross-sectional dependence, and facilitates model selection and interpretability. The method consists of several steps and a novel test for (partial) covariance structure in high dimensions to infer the remaining cross-section dependence at each step. We develop the theory for the model and demonstrate the validity of the multiplier bootstrap for testing a high-dimensional (partial) covariance structure. A simulation study and applications support the theory.

因子模型和稀疏模型被广泛用于在高维空间中施加低维结构。然而，它们似乎是相互排斥的。我们提出了一种提升方法，该方法结合了这两种模型在监督学习方法中的优点，可以有效地探索高维数据集中的所有信息。该方法基于具有可观察和/或潜在共同因素和特殊成分的高维面板数据的灵活模型。该模型称为因子增广回归模型。它将主成分和稀疏回归作为具体模型，大大削弱了截面依赖性，便于模型选择和可解释性。该方法包括几个步骤和一个新的高维(部分)协方差结构检验，以推断每一步的剩余截面依赖性。我们发展了该模型的理论，并证明了乘数自举法用于测试高维(部分)协方差结构的有效性。仿真研究和应用支持了这一理论。

引用次数: 0

Projected state-action balancing weights for offline reinforcement learning 用于离线强化学习的预估状态-行为平衡权值

1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2023-08-01 DOI: 10.1214/23-aos2302

Jiayi Wang, Zhengling Qi, Raymond K. W. Wong

Off-policy evaluation is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and the covariate balancing idea in causal inference, we propose a novel estimator with approximately projected state-action balancing weights for the policy value estimation. We obtain the convergence rate of these weights, and show that the proposed value estimator is asymptotically normal under technical conditions. In terms of asymptotics, our results scale with both the number of trajectories and the number of decision points at each trajectory. As such, consistency can still be achieved with a limited number of subjects when the number of decision points diverges. In addition, we develop a necessary and sufficient condition for establishing the well-posedness of the operator that relates to the nonparametric Q-function estimation in the off-policy setting, which characterizes the difficulty of Q-function estimation and may be of independent interest. Numerical experiments demonstrate the promising performance of our proposed estimator.

非策略评估被认为是强化学习(RL)中的一个基本和具有挑战性的问题。本文研究了在无限视界马尔可夫决策过程框架下，基于可能不同策略产生的预采集数据对目标策略的价值估计。基于最近发展起来的强化学习中的边际重要性抽样方法和因果推理中的协变量平衡思想，我们提出了一种具有近似投影状态-行为平衡权的策略值估计器。我们得到了这些权值的收敛速率，并证明了所提出的值估计量在技术条件下是渐近正态的。在渐近性方面，我们的结果与轨迹的数量和每个轨迹上的决策点的数量都有关系。因此，当决策点的数量偏离时，仍然可以用有限数量的受试者实现一致性。此外，我们还建立了一个关于非参数q函数估计的算子的适定性的充分必要条件，它表征了q函数估计的难度，可能具有独立的研究意义。数值实验证明了该估计方法的良好性能。

{"title":"Projected state-action balancing weights for offline reinforcement learning","authors":"Jiayi Wang, Zhengling Qi, Raymond K. W. Wong","doi":"10.1214/23-aos2302","DOIUrl":"https://doi.org/10.1214/23-aos2302","url":null,"abstract":"Off-policy evaluation is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and the covariate balancing idea in causal inference, we propose a novel estimator with approximately projected state-action balancing weights for the policy value estimation. We obtain the convergence rate of these weights, and show that the proposed value estimator is asymptotically normal under technical conditions. In terms of asymptotics, our results scale with both the number of trajectories and the number of decision points at each trajectory. As such, consistency can still be achieved with a limited number of subjects when the number of decision points diverges. In addition, we develop a necessary and sufficient condition for establishing the well-posedness of the operator that relates to the nonparametric Q-function estimation in the off-policy setting, which characterizes the difficulty of Q-function estimation and may be of independent interest. Numerical experiments demonstrate the promising performance of our proposed estimator.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A cross-validation framework for signal denoising with applications to trend filtering, dyadic CART and beyond 一个用于信号去噪的交叉验证框架，应用于趋势滤波，二元CART等

1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2023-08-01 DOI: 10.1214/23-aos2283

Anamitra Chaudhuri, Sabyasachi Chatterjee

This paper formulates a general cross-validation framework for signal denoising. The general framework is then applied to nonparametric regression methods such as trend filtering and dyadic CART. The resulting cross-validated versions are then shown to attain nearly the same rates of convergence as are known for the optimally tuned analogues. There did not exist any previous theoretical analyses of cross-validated versions of trend filtering or dyadic CART. To illustrate the generality of the framework, we also propose and study cross-validated versions of two fundamental estimators; lasso for high-dimensional linear regression and singular value thresholding for matrix estimation. Our general framework is inspired by the ideas in Chatterjee and Jafarov (2015) and is potentially applicable to a wide range of estimation methods which use tuning parameters.

本文提出了一种通用的信号去噪交叉验证框架。然后将一般框架应用于趋势滤波和二元CART等非参数回归方法。由此产生的交叉验证的版本，然后被证明达到几乎相同的收敛速度为已知的最优调整的类似物。目前还没有任何理论分析的交叉验证版本的趋势滤波或二元CART。为了说明框架的通用性，我们还提出并研究了两个基本估计器的交叉验证版本;Lasso用于高维线性回归，奇异值阈值用于矩阵估计。我们的总体框架受到Chatterjee和Jafarov(2015)思想的启发，并且可能适用于使用调优参数的广泛估计方法。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Annals of Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀