首页 > 最新文献

Annals of Statistics最新文献

英文 中文
On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization. 具有测量误差的高维泊松模型:非线性非凸优化的假设检验。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-02-01 DOI: 10.1214/22-aos2248
Fei Jiang, Yeqing Zhou, Jianxuan Liu, Yanyuan Ma

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

本文研究了含噪声高维协变量的泊松回归模型的估计和检验,该模型在噪声大数据分析中有广泛的应用。校正由协变量噪声引起的估计偏差导致非凸目标函数最小化。进一步处理高维问题会使我们对目标函数增加一个可接受的惩罚项。我们提出通过最小化惩罚目标函数来估计回归参数。我们得到了估计器的L1和L2收敛速率,并证明了变量选择的一致性。我们进一步建立了参数的任意子集的渐近正态性,只要其基数增长足够慢,该子集可以有无限多个分量。基于估计量的渐近正态性,我们开发了Wald和score检验,它允许对子集的成员的线性函数进行检验。我们通过广泛的模拟来检验所提出的测试的有限样本性能。最后,该方法成功应用于阿尔茨海默病神经影像学倡议研究,初步推动了本工作的开展。
{"title":"On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization.","authors":"Fei Jiang,&nbsp;Yeqing Zhou,&nbsp;Jianxuan Liu,&nbsp;Yanyuan Ma","doi":"10.1214/22-aos2248","DOIUrl":"https://doi.org/10.1214/22-aos2248","url":null,"abstract":"<p><p>We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the <i>L</i><sub>1</sub> and <i>L</i><sub>2</sub> convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"51 1","pages":"233-259"},"PeriodicalIF":4.5,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438917/pdf/nihms-1868138.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10054730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization 具有测量误差的高维泊松模型:非线性非凸优化的假设检验
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2022-12-31 DOI: 10.48550/arXiv.2301.00139
Fei Jiang, Yeqing Zhou, Jianxuan Liu, Yanyuan Ma
We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.
本文研究了含噪声高维协变量的泊松回归模型的估计和检验,该模型在噪声大数据分析中有广泛的应用。校正由协变量噪声引起的估计偏差导致非凸目标函数最小化。进一步处理高维问题会使我们对目标函数增加一个可接受的惩罚项。我们提出通过最小化惩罚目标函数来估计回归参数。我们得到了估计器的L1和L2收敛速率,并证明了变量选择的一致性。我们进一步建立了参数的任意子集的渐近正态性,只要其基数增长足够慢,该子集可以有无限多个分量。基于估计量的渐近正态性,我们开发了Wald和score检验,它允许对子集的成员的线性函数进行检验。我们通过广泛的模拟来检验所提出的测试的有限样本性能。最后,该方法成功应用于阿尔茨海默病神经影像学倡议研究,初步推动了本工作的开展。
{"title":"On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization","authors":"Fei Jiang, Yeqing Zhou, Jianxuan Liu, Yanyuan Ma","doi":"10.48550/arXiv.2301.00139","DOIUrl":"https://doi.org/10.48550/arXiv.2301.00139","url":null,"abstract":"We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"51 1 1","pages":"233-259"},"PeriodicalIF":4.5,"publicationDate":"2022-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45193479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES. 平均奖励马尔可夫决策过程中的批量策略学习。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2022-12-01 DOI: 10.1214/22-aos2231
Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan A Murphy

We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.

研究了无限视界马尔可夫决策过程中的批量(离线)策略学习问题。在移动医疗应用程序的激励下,我们专注于学习一种使长期平均回报最大化的策略。我们提出了一种双鲁棒的平均奖励估计器,并证明它达到了半参数效率。在此基础上,提出了一种优化算法来计算参数化随机策略类的最优策略。估计策略的性能通过策略类中最优平均奖励与估计策略的平均奖励之间的差来衡量,并建立有限样本后悔保证。通过模拟研究和对促进身体活动的移动健康研究的分析说明了该方法的性能。
{"title":"BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES.","authors":"Peng Liao,&nbsp;Zhengling Qi,&nbsp;Runzhe Wan,&nbsp;Predrag Klasnja,&nbsp;Susan A Murphy","doi":"10.1214/22-aos2231","DOIUrl":"https://doi.org/10.1214/22-aos2231","url":null,"abstract":"<p><p>We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 6","pages":"3364-3387"},"PeriodicalIF":4.5,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10072865/pdf/nihms-1837036.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9270218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
LINEAR BIOMARKER COMBINATION FOR CONSTRAINED CLASSIFICATION. 约束分类的线性生物标志物组合。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2022-10-01 Epub Date: 2022-10-27 DOI: 10.1214/22-aos2210
Yijian Huang, Martin G Sanda

Multiple biomarkers are often combined to improve disease diagnosis. The uniformly optimal combination, i.e., with respect to all reasonable performance metrics, unfortunately requires excessive distributional modeling, to which the estimation can be sensitive. An alternative strategy is rather to pursue local optimality with respect to a specific performance metric. Nevertheless, existing methods may not target clinical utility of the intended medical test, which usually needs to operate above a certain sensitivity or specificity level, or do not have their statistical properties well studied and understood. In this article, we develop and investigate a linear combination method to maximize the clinical utility empirically for such a constrained classification. The combination coefficient is shown to have cube root asymptotics. The convergence rate and limiting distribution of the predictive performance are subsequently established, exhibiting robustness of the method in comparison with others. An algorithm with sound statistical justification is devised for efficient and high-quality computation. Simulations corroborate the theoretical results, and demonstrate good statistical and computational performance. Illustration with a clinical study on aggressive prostate cancer detection is provided.

多种生物标志物经常联合使用以改善疾病诊断。不幸的是,统一的最优组合,即,关于所有合理的性能指标,需要过多的分布建模,这对估计可能很敏感。另一种策略是针对特定的性能指标追求局部最优性。然而,现有的方法可能无法针对预期医学测试的临床应用,通常需要在一定的灵敏度或特异性水平以上操作,或者对其统计特性没有很好的研究和理解。在本文中,我们开发和研究了一种线性组合方法,以最大限度地提高这种约束分类的临床效用。证明了组合系数具有立方根渐近性。随后建立了预测性能的收敛率和极限分布,与其他方法相比,显示了该方法的鲁棒性。为了提高计算效率和质量,设计了一种具有良好统计合理性的算法。仿真结果证实了理论结果,并显示出良好的统计性能和计算性能。提供了一项侵袭性前列腺癌检测的临床研究的例证。
{"title":"LINEAR BIOMARKER COMBINATION FOR CONSTRAINED CLASSIFICATION.","authors":"Yijian Huang,&nbsp;Martin G Sanda","doi":"10.1214/22-aos2210","DOIUrl":"https://doi.org/10.1214/22-aos2210","url":null,"abstract":"<p><p>Multiple biomarkers are often combined to improve disease diagnosis. The uniformly optimal combination, i.e., with respect to all reasonable performance metrics, unfortunately requires excessive distributional modeling, to which the estimation can be sensitive. An alternative strategy is rather to pursue local optimality with respect to a specific performance metric. Nevertheless, existing methods may not target clinical utility of the intended medical test, which usually needs to operate above a certain sensitivity or specificity level, or do not have their statistical properties well studied and understood. In this article, we develop and investigate a linear combination method to maximize the clinical utility empirically for such a constrained classification. The combination coefficient is shown to have cube root asymptotics. The convergence rate and limiting distribution of the predictive performance are subsequently established, exhibiting robustness of the method in comparison with others. An algorithm with sound statistical justification is devised for efficient and high-quality computation. Simulations corroborate the theoretical results, and demonstrate good statistical and computational performance. Illustration with a clinical study on aggressive prostate cancer detection is provided.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 5","pages":"2793-2815"},"PeriodicalIF":4.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635489/pdf/nihms-1819429.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40449706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING. 双重去偏套索:隐藏混杂下的高维推理。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2022-06-01 Epub Date: 2022-06-16 DOI: 10.1214/21-aos2152
Zijian Guo, Domagoj Ćevid, Peter Bühlmann

Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the Doubly Debiased Lasso estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.

从观测数据推断因果关系或相关关联可能因隐藏混淆的存在而无效。我们专注于一个高维线性回归设置,其中测量的协变量受到隐藏混淆的影响,并提出了回归系数向量的各个分量的双去偏Lasso估计器。我们提出的方法同时修正了由于高维参数估计引起的偏差和由于隐藏混杂引起的偏差。我们建立了它的渐近正态性,并证明了它在高斯-马尔可夫意义上是有效的。我们的方法的有效性依赖于一个密集的混杂假设,即每个混杂变量影响许多协变量。有限样本性能通过广泛的模拟研究和基因组应用来说明。
{"title":"DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING.","authors":"Zijian Guo,&nbsp;Domagoj Ćevid,&nbsp;Peter Bühlmann","doi":"10.1214/21-aos2152","DOIUrl":"https://doi.org/10.1214/21-aos2152","url":null,"abstract":"<p><p>Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the <i>Doubly Debiased Lasso</i> estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 3","pages":"1320-1347"},"PeriodicalIF":4.5,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9365063/pdf/nihms-1824950.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40608265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION. 基于辅助信息的大规模多重测试的最优错误发现率控制。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2022-04-01 DOI: 10.1214/21-aos2128
Hongyuan Cao, Jun Chen, Xianyang Zhang

Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.

大规模多重检验是高维统计推理中的一个基本问题。反映假设之间结构关系的各种类型的辅助信息越来越普遍。利用这些辅助信息可以提高统计能力。为此,我们提出了一个基于两组混合模型的框架,该模型对不同的先验假设具有不同的为零概率,其中辅助信息与为零的先验概率之间施加了形状约束关系。在控制平均错误发现率的情况下,设计了一个最优拒绝规则,使真阳性的期望数量最大化。针对有序结构,我们开发了一种鲁棒的EM算法来同时估计备择假设下为零的先验概率和p值的分布。我们从经验和理论两方面证明了所提出的方法在控制错误发现率的同时具有比最先进的竞争对手更好的能力。大量的仿真实验证明了该方法的优越性。来自全基因组关联研究的数据集被用来说明新的方法。
{"title":"OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION.","authors":"Hongyuan Cao,&nbsp;Jun Chen,&nbsp;Xianyang Zhang","doi":"10.1214/21-aos2128","DOIUrl":"https://doi.org/10.1214/21-aos2128","url":null,"abstract":"<p><p>Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses <i>a priori</i>, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of <i>p</i>-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 2","pages":"807-857"},"PeriodicalIF":4.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153594/pdf/nihms-1840915.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9776938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
FUNCTIONAL SUFFICIENT DIMENSION REDUCTION THROUGH AVERAGE FRÉCHET DERIVATIVES. 通过平均弗雷谢特导数实现函数充分降维。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2022-04-01 Epub Date: 2022-04-07 DOI: 10.1214/21-aos2131
Kuang-Yao Lee, Lexin Li

Sufficient dimension reduction (SDR) embodies a family of methods that aim for reduction of dimensionality without loss of information in a regression setting. In this article, we propose a new method for nonparametric function-on-function SDR, where both the response and the predictor are a function. We first develop the notions of functional central mean subspace and functional central subspace, which form the population targets of our functional SDR. We then introduce an average Fréchet derivative estimator, which extends the gradient of the regression function to the operator level and enables us to develop estimators for our functional dimension reduction spaces. We show the resulting functional SDR estimators are unbiased and exhaustive, and more importantly, without imposing any distributional assumptions such as the linearity or the constant variance conditions that are commonly imposed by all existing functional SDR methods. We establish the uniform convergence of the estimators for the functional dimension reduction spaces, while allowing both the number of Karhunen-Loève expansions and the intrinsic dimension to diverge with the sample size. We demonstrate the efficacy of the proposed methods through both simulations and two real data examples.

充分降维法(SDR)是一系列在回归环境中减少维度而不损失信息的方法。在本文中,我们提出了一种非参数函数对函数 SDR 的新方法,其中响应和预测都是一个函数。我们首先提出了函数中心均值子空间和函数中心子空间的概念,它们构成了函数 SDR 的群体目标。然后,我们引入平均弗雷谢特导数估计器,它将回归函数的梯度扩展到算子层面,使我们能够为函数降维空间开发估计器。我们证明了由此产生的函数 SDR 估计器是无偏的、详尽的,更重要的是,它不需要施加任何分布假设,如线性或恒定方差条件,而这些假设是所有现有函数 SDR 方法普遍施加的。我们建立了函数降维空间估计器的均匀收敛性,同时允许卡尔胡宁-洛埃夫展开数和本征维度随样本大小发散。我们通过模拟和两个真实数据实例证明了所提方法的有效性。
{"title":"FUNCTIONAL SUFFICIENT DIMENSION REDUCTION THROUGH AVERAGE FRÉCHET DERIVATIVES.","authors":"Kuang-Yao Lee, Lexin Li","doi":"10.1214/21-aos2131","DOIUrl":"10.1214/21-aos2131","url":null,"abstract":"<p><p>Sufficient dimension reduction (SDR) embodies a family of methods that aim for reduction of dimensionality without loss of information in a regression setting. In this article, we propose a new method for nonparametric function-on-function SDR, where both the response and the predictor are a function. We first develop the notions of functional central mean subspace and functional central subspace, which form the population targets of our functional SDR. We then introduce an average Fréchet derivative estimator, which extends the gradient of the regression function to the operator level and enables us to develop estimators for our functional dimension reduction spaces. We show the resulting functional SDR estimators are unbiased and exhaustive, and more importantly, without imposing any distributional assumptions such as the linearity or the constant variance conditions that are commonly imposed by all existing functional SDR methods. We establish the uniform convergence of the estimators for the functional dimension reduction spaces, while allowing both the number of Karhunen-Loève expansions and the intrinsic dimension to diverge with the sample size. We demonstrate the efficacy of the proposed methods through both simulations and two real data examples.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 2","pages":"904-929"},"PeriodicalIF":4.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10085580/pdf/nihms-1746366.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9320340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SEMIPARAMETRIC LATENT-CLASS MODELS FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA. 多变量纵向和生存数据的半参数潜伏类模型。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2022-02-01 Epub Date: 2022-02-16 DOI: 10.1214/21-aos2117
Kin Yau Wong, Donglin Zeng, D Y Lin

In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal and survival outcomes. We combine nonparametric maximum likelihood estimation with sieve estimation and devise an efficient EM algorithm to implement the proposed approach. We establish the asymptotic properties of the proposed estimators through novel use of modern empirical process theory, sieve estimation theory, and semiparametric efficiency theory. Finally, we demonstrate the advantages of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities study.

在长期随访研究中,经常收集多变量反应变量的重复测量数据以及某一事件发生的时间数据。为了联合分析这些纵向数据和生存时间,我们提出了一类一般的半参数潜在类模型,该模型适应了纵向结果和生存结果之间具有灵活依赖结构的异质性研究人群。我们将非参数最大似然估计与筛估计相结合,并设计了一种有效的EM算法来实现所提出的方法。我们通过新颖地使用现代经验过程理论、筛估计理论和半参数效率理论,建立了所提出的估计量的渐近性质。最后,我们通过广泛的模拟研究证明了所提出方法的优势,并为社区动脉粥样硬化风险研究提供了应用。
{"title":"SEMIPARAMETRIC LATENT-CLASS MODELS FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA.","authors":"Kin Yau Wong, Donglin Zeng, D Y Lin","doi":"10.1214/21-aos2117","DOIUrl":"10.1214/21-aos2117","url":null,"abstract":"<p><p>In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal and survival outcomes. We combine nonparametric maximum likelihood estimation with sieve estimation and devise an efficient EM algorithm to implement the proposed approach. We establish the asymptotic properties of the proposed estimators through novel use of modern empirical process theory, sieve estimation theory, and semiparametric efficiency theory. Finally, we demonstrate the advantages of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities study.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 1","pages":"487-510"},"PeriodicalIF":4.5,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9269993/pdf/nihms-1764505.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10155118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BRIDGING CONVEX AND NONCONVEX OPTIMIZATION IN ROBUST PCA: NOISE, OUTLIERS, AND MISSING DATA. 在鲁棒 PCA 中连接凸优化和非凸优化:噪声、异常值和缺失数据。
IF 3.2 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-10-01 Epub Date: 2021-11-12 DOI: 10.1214/21-aos2066
Yuxin Chen, Jianqing Fan, Cong Ma, Yuling Yan

This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.

本文为低秩矩阵估计中的凸编程方法提供了改进的理论保证,这种方法适用于 (1) 随机噪声、(2) 严重稀疏异常值和 (3) 数据缺失的情况。这个问题通常被称为鲁棒主成分分析(鲁棒 PCA),在各个领域都有应用。尽管凸松弛具有广泛的适用性,但现有的统计支持(尤其是针对随机噪声的稳定性分析)仍然非常不理想,我们在本文中将对此进行强化。当未知矩阵条件良好、不连贯且秩恒定时,我们证明了原则性凸程序在欧氏损失和 ℓ ∞ 损失方面都能达到近乎最优的统计精度。即使有近乎恒定的部分观测数据被任意大小的异常值所干扰,所有这一切也会发生。关键的分析思路在于将使用中的凸程序与辅助的非凸优化算法连接起来,这也是本文标题的由来。
{"title":"BRIDGING CONVEX AND NONCONVEX OPTIMIZATION IN ROBUST PCA: NOISE, OUTLIERS, AND MISSING DATA.","authors":"Yuxin Chen, Jianqing Fan, Cong Ma, Yuling Yan","doi":"10.1214/21-aos2066","DOIUrl":"10.1214/21-aos2066","url":null,"abstract":"<p><p>This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as <i>robust principal component analysis (robust PCA)</i>, finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the <i>ℓ</i> <sub>∞</sub> loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"49 5","pages":"2948-2971"},"PeriodicalIF":3.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9491514/pdf/nihms-1782570.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33479290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BOOSTED NONPARAMETRIC HAZARDS WITH TIME-DEPENDENT COVARIATES. 具有时间相关协变量的增强非参数风险。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-08-01 Epub Date: 2021-09-29 DOI: 10.1214/20-aos2028
Donald K K Lee, Ningyuan Chen, Hemant Ishwaran

Given functional data from a survival process with time-dependent covariates, we derive a smooth convex representation for its nonparametric log-likelihood functional and obtain its functional gradient. From this we devise a generic gradient boosting procedure for estimating the hazard function nonparametrically. An illustrative implementation of the procedure using regression trees is described to show how to recover the unknown hazard. The generic estimator is consistent if the model is correctly specified; alternatively an oracle inequality can be demonstrated for tree-based models. To avoid overfitting, boosting employs several regularization devices. One of them is step-size restriction, but the rationale for this is somewhat mysterious from the viewpoint of consistency. Our work brings some clarity to this issue by revealing that step-size restriction is a mechanism for preventing the curvature of the risk from derailing convergence.

给定具有时变协变量的生存过程的函数数据,导出了其非参数对数似然泛函的光滑凸表示,并得到了其泛函梯度。在此基础上,我们设计了一种非参数估计危险函数的一般梯度增强方法。描述了使用回归树的过程的说明性实现,以显示如何恢复未知的危险。如果正确指定了模型,则通用估计量是一致的;另外,可以为基于树的模型演示oracle不等式。为了避免过拟合,增强采用了几个正则化装置。其中之一是步长限制,但从一致性的角度来看,其基本原理有些神秘。我们的工作通过揭示步长限制是防止偏离收敛的风险曲率的机制,为这个问题提供了一些清晰度。
{"title":"BOOSTED NONPARAMETRIC HAZARDS WITH TIME-DEPENDENT COVARIATES.","authors":"Donald K K Lee,&nbsp;Ningyuan Chen,&nbsp;Hemant Ishwaran","doi":"10.1214/20-aos2028","DOIUrl":"https://doi.org/10.1214/20-aos2028","url":null,"abstract":"<p><p>Given functional data from a survival process with time-dependent covariates, we derive a smooth convex representation for its nonparametric log-likelihood functional and obtain its functional gradient. From this we devise a generic gradient boosting procedure for estimating the hazard function nonparametrically. An illustrative implementation of the procedure using regression trees is described to show how to recover the unknown hazard. The generic estimator is consistent if the model is correctly specified; alternatively an oracle inequality can be demonstrated for tree-based models. To avoid overfitting, boosting employs several regularization devices. One of them is step-size restriction, but the rationale for this is somewhat mysterious from the viewpoint of consistency. Our work brings some clarity to this issue by revealing that step-size restriction is a mechanism for preventing the curvature of the risk from derailing convergence.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"49 4","pages":"2101-2128"},"PeriodicalIF":4.5,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8691747/pdf/nihms-1683276.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39748775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1