首页 > 最新文献

Annals of Statistics最新文献

英文 中文
Sharp minimax distribution estimation for current status censoring with or without missing 有或没有缺失的当前状态审查的尖锐极小最大值分布估计
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2021-02-01 DOI: 10.1214/20-AOS1970
S. Efromovich
Nonparametric estimation of the cumulative distribution function and the probability density of a lifetime X modified by a current status censoring (CSC), including cases of right and left missing data, is a classical ill-posed problem with biased data. The biased nature of CSC data may preclude us from consistent estimation unless the biasing function is known or may be estimated, and its ill-posed nature slows down rates of convergence. Under a traditionally studied CSC, we observe a sample from $(Z,Delta )$ where a continuous monitoring time $Z$ is independent of $X$, $Delta :=I(Xleq Z)$ is the status, and the bias of observations is created by the density of $Z$ which is estimable. In presence of right or left missing, we observe corresponding samples from $(Delta Z,Delta )$ or $((1-Delta )Z,Delta )$; the data are again biased but now the density of $Z$ cannot be estimated from the data. As a result, to solve the estimation problem, either the density of $Z$ must be known (like in a controlled study) or an extra cross-sectional sampling of $Z$, which is typically simpler than an underlying CSC study, be conducted. The main aim of the paper is to develop for this biased and ill-posed problem the theory of efficient (sharp-minimax) estimation which is inspired by known results for the case of directly observed $X$. Among interesting aspects of the developed theory: (i) While sharp-minimax analysis of missing CSC may follow the classical Pinsker’s methodology, analysis of CSC requires a more complicated estimation procedure based on a special smoothing in both frequency and time domains; (ii) Efficient estimation requires solving an old-standing problem of approximating aperiodic Sobolev functions; (iii) If smoothness of the cdf of $X$ is known, then its rate-minimax estimation is possible even if the density of $Z$ is rougher. Real and simulated examples, as well as extensions of the core models to dependent $X$ and Z and case-control CSC, are presented.
通过当前状态截尾(CSC)修改的寿命X的累积分布函数和概率密度的非参数估计,包括左右缺失数据的情况,是一个具有偏差数据的经典不适定问题。CSC数据的偏差性质可能会使我们无法进行一致的估计,除非偏差函数是已知的或可以估计的,并且其不适定性质会减慢收敛速度。在传统研究的CSC下,我们观察到来自$(Z,Delta)$的样本,其中连续监测时间$Z$独立于$X$,$Delta:=I(Xleq Z)$是状态,并且观测的偏差由$Z$的密度产生,这是可估计的。在存在右或左缺失的情况下,我们观察到来自$(Delta Z,Delta)$或$((1-Delta(Z,Deleta))$的相应样本;数据再次有偏差,但现在不能根据数据估计$Z$的密度。因此,为了解决估计问题,必须知道$Z$的密度(就像在对照研究中一样),或者进行额外的$Z$横截面抽样,这通常比基础CSC研究更简单。本文的主要目的是针对这一有偏和不适定性问题发展有效(尖锐极小极大)估计理论,该理论受到直接观测$X$情况下已知结果的启发。在所发展的理论的有趣方面中:(i)虽然缺失CSC的尖锐极小极大分析可能遵循经典的Pinsker方法,但CSC的分析需要基于频域和时域中的特殊平滑的更复杂的估计过程;(ii)有效的估计需要解决近似非周期Sobolev函数的老问题;(iii)如果$X$的cdf的光滑性是已知的,则即使$Z$的密度更粗糙,其速率最小最大估计也是可能的。给出了真实和模拟的例子,以及将核心模型扩展到依赖$X$和Z以及病例对照CSC。
{"title":"Sharp minimax distribution estimation for current status censoring with or without missing","authors":"S. Efromovich","doi":"10.1214/20-AOS1970","DOIUrl":"https://doi.org/10.1214/20-AOS1970","url":null,"abstract":"Nonparametric estimation of the cumulative distribution function and the probability density of a lifetime X modified by a current status censoring (CSC), including cases of right and left missing data, is a classical ill-posed problem with biased data. The biased nature of CSC data may preclude us from consistent estimation unless the biasing function is known or may be estimated, and its ill-posed nature slows down rates of convergence. Under a traditionally studied CSC, we observe a sample from $(Z,Delta )$ where a continuous monitoring time $Z$ is independent of $X$, $Delta :=I(Xleq Z)$ is the status, and the bias of observations is created by the density of $Z$ which is estimable. In presence of right or left missing, we observe corresponding samples from $(Delta Z,Delta )$ or $((1-Delta )Z,Delta )$; the data are again biased but now the density of $Z$ cannot be estimated from the data. As a result, to solve the estimation problem, either the density of $Z$ must be known (like in a controlled study) or an extra cross-sectional sampling of $Z$, which is typically simpler than an underlying CSC study, be conducted. The main aim of the paper is to develop for this biased and ill-posed problem the theory of efficient (sharp-minimax) estimation which is inspired by known results for the case of directly observed $X$. Among interesting aspects of the developed theory: (i) While sharp-minimax analysis of missing CSC may follow the classical Pinsker’s methodology, analysis of CSC requires a more complicated estimation procedure based on a special smoothing in both frequency and time domains; (ii) Efficient estimation requires solving an old-standing problem of approximating aperiodic Sobolev functions; (iii) If smoothness of the cdf of $X$ is known, then its rate-minimax estimation is possible even if the density of $Z$ is rougher. Real and simulated examples, as well as extensions of the core models to dependent $X$ and Z and case-control CSC, are presented.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49238602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inference for conditional value-at-risk of a predictive regression 预测回归的条件风险值推断
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-12-01 DOI: 10.1214/19-aos1937
Yi He, Yanxi Hou, L. Peng, Haipeng Shen
Conditional value-at-risk is a popular risk measure in risk management. We study the inference problem of conditional value-at-risk under a linear predictive regression model. We derive the asymptotic distribution of the least squares estimator for the conditional value-at-risk. Our results relax the model assumptions made in Chun et al. (2012) and correct their mistake in the asymptotic variance expression. We show that the asymptotic variance depends on the quantile density function of the unobserved error and whether the model has a predictor with infinite variance, which makes it challenging to actually quantify the uncertainty of the conditional risk measure. To make the inference feasible, we then propose a smooth empirical likelihood based method for constructing a confidence interval for the conditional value-at-risk based on either independent errors or GARCH errors. Our approach not only bypasses the challenge of directly estimating the asymptotic variance but also does not need to know whether there exists an infinite variance predictor in the predictive model. Furthermore, we apply the same idea to the quantile regression method, which allows infinite variance predictors and generalizes the parameter estimation in Whang (2006) to conditional value-at-risk in the supplementary material. We demonstrate the finite sample performance of the derived confidence intervals through numerical studies before applying them to real data.
条件风险价值是风险管理中常用的风险度量。研究了线性预测回归模型下条件风险值的推理问题。导出了条件风险值的最小二乘估计量的渐近分布。我们的结果放宽了Chun等人(2012)的模型假设,并纠正了他们在渐近方差表达式中的错误。我们表明渐近方差取决于未观测误差的分位数密度函数以及模型是否具有具有无限方差的预测器,这使得实际量化条件风险度量的不确定性具有挑战性。为了使推理可行,我们提出了一种光滑的基于经验似然的方法,用于构建基于独立误差或GARCH误差的条件风险值的置信区间。我们的方法不仅绕过了直接估计渐近方差的挑战,而且不需要知道预测模型中是否存在无限方差预测器。此外,我们将相同的思想应用于分位数回归方法,该方法允许无限方差预测因子,并将Whang(2006)中的参数估计推广到补充材料中的条件风险值。在将所得置信区间应用于实际数据之前,通过数值研究证明了所得置信区间的有限样本性能。
{"title":"Inference for conditional value-at-risk of a predictive regression","authors":"Yi He, Yanxi Hou, L. Peng, Haipeng Shen","doi":"10.1214/19-aos1937","DOIUrl":"https://doi.org/10.1214/19-aos1937","url":null,"abstract":"Conditional value-at-risk is a popular risk measure in risk management. We study the inference problem of conditional value-at-risk under a linear predictive regression model. We derive the asymptotic distribution of the least squares estimator for the conditional value-at-risk. Our results relax the model assumptions made in Chun et al. (2012) and correct their mistake in the asymptotic variance expression. We show that the asymptotic variance depends on the quantile density function of the unobserved error and whether the model has a predictor with infinite variance, which makes it challenging to actually quantify the uncertainty of the conditional risk measure. To make the inference feasible, we then propose a smooth empirical likelihood based method for constructing a confidence interval for the conditional value-at-risk based on either independent errors or GARCH errors. Our approach not only bypasses the challenge of directly estimating the asymptotic variance but also does not need to know whether there exists an infinite variance predictor in the predictive model. Furthermore, we apply the same idea to the quantile regression method, which allows infinite variance predictors and generalizes the parameter estimation in Whang (2006) to conditional value-at-risk in the supplementary material. We demonstrate the finite sample performance of the derived confidence intervals through numerical studies before applying them to real data.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48812460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Nonparametric drift estimation for i.i.d. paths of stochastic differential equations 随机微分方程i.i.d.路径的非参数漂移估计
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-12-01 DOI: 10.1214/19-aos1933
F. Comte, V. Genon-Catalot
By Fabienne Comte∗, Valentine Genon-Catalot∗ Université de Paris, MAP5, CNRS, F-75006, France ∗ We considerN independent stochastic processes (Xi(t), t ∈ [0, T ]), i = 1, . . . , N , de ned by a one-dimensional stochastic di erential equation which are continuously observed throughout a time interval [0, T ] where T is xed. We study nonparametric estimation of the drift function on a given subset A of R. Projection estimators are de ned on nite dimensional subsets of L(A, dx). We stress that the set A may be compact or not and the di usion coe cient may be bounded or not. A data-driven procedure to select the dimension of the projection space is proposed where the dimension is chosen within a random collection of models. Upper bounds of risks are obtained, the assumptions are discussed and simulation experiments are reported.
Fabienne Comte*,Valentine Genon-Catalot*,巴黎大学,MAP5,CNRS,F-75006,法国*我们考虑N个独立随机过程(Xi(t),t∈[0,t]),i=1,N,由一维随机微分方程定义,该方程在整个时间间隔[0,T]内连续观测,其中T为x。我们研究了R的给定子集a上漂移函数的非参数估计。在L(a,dx)的nite维子集上定义了投影估计。我们强调集合A可以是紧致的,也可以是非紧致的,并且扩散系数可以是有界的。提出了一种选择投影空间尺寸的数据驱动程序,其中尺寸是在随机模型集合中选择的。获得了风险的上限,讨论了假设,并报告了模拟实验。
{"title":"Nonparametric drift estimation for i.i.d. paths of stochastic differential equations","authors":"F. Comte, V. Genon-Catalot","doi":"10.1214/19-aos1933","DOIUrl":"https://doi.org/10.1214/19-aos1933","url":null,"abstract":"By Fabienne Comte∗, Valentine Genon-Catalot∗ Université de Paris, MAP5, CNRS, F-75006, France ∗ We considerN independent stochastic processes (Xi(t), t ∈ [0, T ]), i = 1, . . . , N , de ned by a one-dimensional stochastic di erential equation which are continuously observed throughout a time interval [0, T ] where T is xed. We study nonparametric estimation of the drift function on a given subset A of R. Projection estimators are de ned on nite dimensional subsets of L(A, dx). We stress that the set A may be compact or not and the di usion coe cient may be bounded or not. A data-driven procedure to select the dimension of the projection space is proposed where the dimension is chosen within a random collection of models. Upper bounds of risks are obtained, the assumptions are discussed and simulation experiments are reported.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48826148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Irreducibility and geometric ergodicity of Hamiltonian Monte Carlo 哈密顿蒙特卡罗的不可约性和几何遍历性
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-12-01 DOI: 10.1214/19-aos1941
Alain Durmus, É. Moulines, E. Saksman
Hamiltonian Monte Carlo (HMC) is currently one of the most popular Markov Chain Monte Carlo algorithms to sample smooth distributions over continuous state space. This paper discusses the irreducibility and geometric ergodicity of the HMC algorithm. We consider cases where the number of steps of the StörmerVerlet integrator is either fixed or random. Under mild conditions on the potential U associated with target distribution π, we first show that the Markov kernel associated to the HMC algorithm is irreducible and positive recurrent. Under more stringent conditions, we then establish that the Markov kernel is Harris recurrent. We provide verifiable conditions on U under which the HMC sampler is geometrically ergodic. Finally, we illustrate our results on several examples.
哈密顿蒙特卡罗算法(HMC)是目前最流行的用于连续状态空间上光滑分布采样的马尔可夫链蒙特卡罗算法之一。本文讨论了HMC算法的不可约性和几何遍历性。我们考虑StörmerVerlet积分器的步数是固定的或随机的情况。在与目标分布π相关的潜在U的温和条件下,我们首先证明了与HMC算法相关的马尔可夫核是不可约的和正循环的。在更严格的条件下,我们建立了Markov核是Harris递归的。我们在U上给出了HMC采样器几何遍历的可验证条件。最后,我们用几个例子来说明我们的结果。
{"title":"Irreducibility and geometric ergodicity of Hamiltonian Monte Carlo","authors":"Alain Durmus, É. Moulines, E. Saksman","doi":"10.1214/19-aos1941","DOIUrl":"https://doi.org/10.1214/19-aos1941","url":null,"abstract":"Hamiltonian Monte Carlo (HMC) is currently one of the most popular Markov Chain Monte Carlo algorithms to sample smooth distributions over continuous state space. This paper discusses the irreducibility and geometric ergodicity of the HMC algorithm. We consider cases where the number of steps of the StörmerVerlet integrator is either fixed or random. Under mild conditions on the potential U associated with target distribution π, we first show that the Markov kernel associated to the HMC algorithm is irreducible and positive recurrent. Under more stringent conditions, we then establish that the Markov kernel is Harris recurrent. We provide verifiable conditions on U under which the HMC sampler is geometrically ergodic. Finally, we illustrate our results on several examples.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44346149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fréchet change-point detection 变点检测
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-12-01 DOI: 10.1214/19-AOS1930
Paromita Dubey, H. Müller
{"title":"Fréchet change-point detection","authors":"Paromita Dubey, H. Müller","doi":"10.1214/19-AOS1930","DOIUrl":"https://doi.org/10.1214/19-AOS1930","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44332233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Assessment of the extent of corroboration of an elaborate theory of a causal hypothesis using partial conjunctions of evidence factors 利用证据因素的部分连词对因果假设的详细理论的确证程度进行评估
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-12-01 DOI: 10.1214/19-aos1929
B. Karmakar, Dylan S. Small
An elaborate theory of predictions of a causal hypothesis consists of several falsifiable statements derived from the causal hypothesis. Statistical tests for the various pieces of the elaborate theory help to clarify how much the causal hypothesis is corroborated. In practice, the degree of corroboration of the causal hypothesis has been assessed by a verbal description of which of the several tests provides evidence for which of the several predictions. This verbal approach can miss quantitative patterns. In this paper, we develop a quantitative approach. We first decompose these various tests of the predictions into independent factors with different sources of potential biases. Support for the causal hypothesis is enhanced when many of these evidence factors support the predictions. A sensitivity analysis is used to assess the potential bias that could make the finding of the tests spurious. Along with this multi-parameter sensitivity analysis, we consider the partial conjunctions of the tests. These partial conjunctions quantify the evidence supporting various fractions of the collection of predictions. A partial conjunction test involves combining tests of the components in the partial conjunction. We find the asymptotically optimal combination of tests in the context of a sensitivity analysis. Our analysis of an elaborate theory of a causal hypothesis controls for the familywise error rate.
因果假设的详细预测理论包括从因果假设推导出的几个可证伪的陈述。对这一复杂理论的各个部分进行统计检验,有助于澄清因果假设得到了多少证实。在实践中,因果假设的确证程度是通过口头描述几种检验中的哪一种为几种预测中的哪一种提供了证据来评估的。这种口头方法可能会错过定量模式。在本文中,我们开发了一种定量方法。我们首先将这些预测的各种测试分解为具有不同潜在偏差来源的独立因素。当许多这些证据因素支持预测时,对因果假设的支持得到加强。敏感性分析用于评估可能使检测结果不真实的潜在偏差。随着这种多参数灵敏度分析,我们考虑了部分连接的测试。这些部分连词量化了支持预测集合中不同部分的证据。部分连接测试包括对部分连接中的组件进行组合测试。在灵敏度分析的背景下,我们找到了测试的渐近最优组合。我们对因果假设的详细理论的分析控制了家庭误差率。
{"title":"Assessment of the extent of corroboration of an elaborate theory of a causal hypothesis using partial conjunctions of evidence factors","authors":"B. Karmakar, Dylan S. Small","doi":"10.1214/19-aos1929","DOIUrl":"https://doi.org/10.1214/19-aos1929","url":null,"abstract":"An elaborate theory of predictions of a causal hypothesis consists of several falsifiable statements derived from the causal hypothesis. Statistical tests for the various pieces of the elaborate theory help to clarify how much the causal hypothesis is corroborated. In practice, the degree of corroboration of the causal hypothesis has been assessed by a verbal description of which of the several tests provides evidence for which of the several predictions. This verbal approach can miss quantitative patterns. In this paper, we develop a quantitative approach. We first decompose these various tests of the predictions into independent factors with different sources of potential biases. Support for the causal hypothesis is enhanced when many of these evidence factors support the predictions. A sensitivity analysis is used to assess the potential bias that could make the finding of the tests spurious. Along with this multi-parameter sensitivity analysis, we consider the partial conjunctions of the tests. These partial conjunctions quantify the evidence supporting various fractions of the collection of predictions. A partial conjunction test involves combining tests of the components in the partial conjunction. We find the asymptotically optimal combination of tests in the context of a sensitivity analysis. Our analysis of an elaborate theory of a causal hypothesis controls for the familywise error rate.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43683533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA. 高维纵向数据的显著性检验。
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-10-01 Epub Date: 2020-09-19 DOI: 10.1214/19-aos1900
Ethan X Fang, Yang Ning, Runze Li

This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (2002)'s procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.

本文涉及具有超高维度协变量的纵向数据的统计推断。我们首先研究了为感兴趣的低维参数构建置信区间和假设检验的问题。我们面临的主要挑战是如何在纵向数据存在高维滋扰参数和复杂的主体内相关性的情况下构建一个强大的检验统计量。为了应对这一挑战,我们提出了一种新的二次装饰相关推断函数方法,它能同时消除滋扰参数的影响并结合相关性以提高估计过程的效率。当感兴趣的参数是固定维度时,我们证明了所提出的估计器是渐近正态的,并达到了半参数信息约束,在此基础上我们可以构建一个最优的 Wald 检验统计量。我们进一步扩展了这一结果,并在感兴趣参数的维数随样本量以多项式速率增长的情况下,建立了估计器的极限分布。最后,我们研究了当感兴趣的是高维回归参数向量时如何控制误发现率(FDR)。我们证明,在纵向数据中,将 Storey(2002)的程序应用于每个回归参数的拟议检验统计量,可以渐近地控制 FDR。我们进行了模拟研究,以评估所建议程序的有限样本性能。我们的模拟结果表明,新提出的程序既能控制低维感兴趣参数检验的 I 类误差,也能控制多重检验问题中的 FDR。我们还将提出的程序应用于一个真实数据实例。
{"title":"TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA.","authors":"Ethan X Fang, Yang Ning, Runze Li","doi":"10.1214/19-aos1900","DOIUrl":"10.1214/19-aos1900","url":null,"abstract":"<p><p>This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (2002)'s procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8277154/pdf/nihms-1614211.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39189359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypothesis testing for high-dimensional time series via self-normalization 高维时间序列的自归一化假设检验
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-10-01 DOI: 10.1214/19-AOS1904
Runmin Wang, X. Shao
{"title":"Hypothesis testing for high-dimensional time series via self-normalization","authors":"Runmin Wang, X. Shao","doi":"10.1214/19-AOS1904","DOIUrl":"https://doi.org/10.1214/19-AOS1904","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42553602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Valid post-selection inference in model-free linear regression 无模型线性回归中的有效后选择推理
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-10-01 DOI: 10.1214/19-AOS1917
Arun K. Kuchibhotla, L. Brown, A. Buja, Junhui Cai, E. George, Linda H. Zhao
S.1. Simulations Continued. The simulation setting in this section is the same as in Section 9. We first describe the reason for using the null situation β0 0p in the model. If β0 is an arbitrary non-zero vector, then, for fixed covariates, XiYi cannot be identically distributed and hence only (asymptotically) conservative inference is possible. In simulations this conservativeness confounds with the simultaneity so that the coverage becomes close to 1 (if not 1). In the main manuscript, we have shown plots comparing our method with Berk et al. (2013) and selective inference. We label our confidence region R̂:n,M (12) as “UPoSI,” the projected confidence region B̂ n,M (28) as “UPoSIBox”, and Berk et al. (2013) as “PoSI.” Tables 1, 2, and 3 show exact numbers for the comparison of our method with Berk et al. (2013). Note that size of each dot in the row plot of Figure 9 indicates the proportion of confidence regions of that volume among same-sized models. In Setting A and B, the confidence region volumes of same-sized models are the same. In Setting C, volumes of confidence regions of Berk and PoSI Box enlarge (hence smaller logpVolq{|M |q if the last covariate is included. Tables 4 and 5 show the numbers for the comparison of our method with selective inference when the selection procedure is forward stepwise and LARS, respectively. Sample splitting is a simple procedure that provides valid inference after selection as discussed in Section 1.3. We stress here that this is valid only for independent observations and that the model selected in the first split half could be different from the one selected in the full data. The comparison results with n 1000, p 500 and selection methods forward stepwise, LARS and BIC are summarized in Figure S.1. For sample splitting we have used the Bonferroni correction to obtain simultaneous inference for all coefficients in a model. Table 6 shows the comparison of our method with sample splitting.
S.1。模拟继续说。本节中的模拟设置与第9节中的相同。我们首先描述了在模型中使用零情况β0 0p的原因。如果β0是任意非零向量,则对于固定的协变量,XiYi不可能是同分布的,因此只能(渐近)保守推断。在模拟中,这种保守性与同时性相混淆,使覆盖率接近1(如果不是1)。在主要手稿中,我们展示了将我们的方法与Berk等人(2013)和选择性推断进行比较的图表。我们将我们的置信区域R n,M(12)标记为“UPoSI”,将预测的置信区域B n,M(28)标记为“UPoSIBox”,并将Berk et al.(2013)标记为“PoSI”。表1、2和3显示了我们的方法与Berk et al.(2013)比较的确切数字。注意,图9的行图中每个点的大小表示该体积在相同大小的模型中置信区域的比例。在设置A和B中,相同大小模型的置信区域体积相同。在设置C中,如果包括最后一个协变量,则Berk和PoSI Box的置信区域的体积增大(因此更小的logpVolq{|M |q)。表4和表5分别显示了当选择过程为逐步前向和LARS时,我们的方法与选择性推理的比较数字。样本分割是一个简单的过程,在选择后提供有效的推理,如1.3节所讨论的。我们在这里强调,这只对独立的观测有效,并且在第一个分割部分中选择的模型可能不同于在完整数据中选择的模型。与n 1000, p 500和逐步选择方法,LARS和BIC的比较结果总结在图S.1中。对于样本分割,我们使用Bonferroni校正来获得模型中所有系数的同时推断。表6显示了我们的方法与样本分割的比较。
{"title":"Valid post-selection inference in model-free linear regression","authors":"Arun K. Kuchibhotla, L. Brown, A. Buja, Junhui Cai, E. George, Linda H. Zhao","doi":"10.1214/19-AOS1917","DOIUrl":"https://doi.org/10.1214/19-AOS1917","url":null,"abstract":"S.1. Simulations Continued. The simulation setting in this section is the same as in Section 9. We first describe the reason for using the null situation β0 0p in the model. If β0 is an arbitrary non-zero vector, then, for fixed covariates, XiYi cannot be identically distributed and hence only (asymptotically) conservative inference is possible. In simulations this conservativeness confounds with the simultaneity so that the coverage becomes close to 1 (if not 1). In the main manuscript, we have shown plots comparing our method with Berk et al. (2013) and selective inference. We label our confidence region R̂:n,M (12) as “UPoSI,” the projected confidence region B̂ n,M (28) as “UPoSIBox”, and Berk et al. (2013) as “PoSI.” Tables 1, 2, and 3 show exact numbers for the comparison of our method with Berk et al. (2013). Note that size of each dot in the row plot of Figure 9 indicates the proportion of confidence regions of that volume among same-sized models. In Setting A and B, the confidence region volumes of same-sized models are the same. In Setting C, volumes of confidence regions of Berk and PoSI Box enlarge (hence smaller logpVolq{|M |q if the last covariate is included. Tables 4 and 5 show the numbers for the comparison of our method with selective inference when the selection procedure is forward stepwise and LARS, respectively. Sample splitting is a simple procedure that provides valid inference after selection as discussed in Section 1.3. We stress here that this is valid only for independent observations and that the model selected in the first split half could be different from the one selected in the full data. The comparison results with n 1000, p 500 and selection methods forward stepwise, LARS and BIC are summarized in Figure S.1. For sample splitting we have used the Bonferroni correction to obtain simultaneous inference for all coefficients in a model. Table 6 shows the comparison of our method with sample splitting.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66077588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Asymptotic distribution and detection thresholds for two-sample tests based on geometric graphs 基于几何图的两样本检验的渐近分布和检测阈值
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-10-01 DOI: 10.1214/19-AOS1913
B. Bhattacharya
In this paper we consider the problem of testing the equality of two multivariate distributions based on geometric graphs, constructed using the inter-point distances between the observations. These include the test based on the minimum spanning tree and the K-nearest neighbor (NN) graphs, among others. These tests are asymptotically distribution-free, universally consistent, and computationally efficient, making them particularly useful in modern applications. However, very little is known about the power properties of these tests. In this paper, using theory of stabilizing geometric graphs, we derive the asymptotic distribution of these tests under general alternatives, in the Poissonized setting. Using this, the detection threshold and the limiting local power of the test based on the K-NN graph are obtained, where interesting exponents depending on dimension emerge. This provides a way to compare and justify the performance of these tests in different examples.
在本文中,我们考虑了基于几何图的两个多元分布的相等性检验问题,几何图是使用观测值之间的点间距离构建的。其中包括基于最小生成树和K近邻(NN)图的测试等。这些测试是渐近无分布的、普遍一致的,并且计算效率高,这使得它们在现代应用中特别有用。然而,对这些测试的功率特性知之甚少。本文利用稳定几何图的理论,在Poissonized设置中,导出了这些检验在一般备选方案下的渐近分布。利用此方法,获得了基于K-NN图的测试的检测阈值和极限局部幂,其中出现了取决于维数的有趣指数。这提供了一种方法来比较和证明这些测试在不同示例中的性能。
{"title":"Asymptotic distribution and detection thresholds for two-sample tests based on geometric graphs","authors":"B. Bhattacharya","doi":"10.1214/19-AOS1913","DOIUrl":"https://doi.org/10.1214/19-AOS1913","url":null,"abstract":"In this paper we consider the problem of testing the equality of two multivariate distributions based on geometric graphs, constructed using the inter-point distances between the observations. These include the test based on the minimum spanning tree and the K-nearest neighbor (NN) graphs, among others. These tests are asymptotically distribution-free, universally consistent, and computationally efficient, making them particularly useful in modern applications. However, very little is known about the power properties of these tests. In this paper, using theory of stabilizing geometric graphs, we derive the asymptotic distribution of these tests under general alternatives, in the Poissonized setting. Using this, the detection threshold and the limiting local power of the test based on the K-NN graph are obtained, where interesting exponents depending on dimension emerge. This provides a way to compare and justify the performance of these tests in different examples.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43314523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Annals of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1