The Annals of Statistics最新文献

英文中文

Universal regression with adversarial responses 具有对抗性反应的普遍回归

The Annals of Statistics

Pub Date : 2023-06-01 DOI: 10.1214/23-aos2299

Moise Blanchard, P. Jaillet

引用次数: 1

Minimax rate of distribution estimation on unknown submanifolds under adversarial losses 对抗性损失下未知子流形分布估计的极小极大率

The Annals of Statistics

Pub Date : 2023-06-01 DOI: 10.1214/23-aos2291

Rong Tang, Yun Yang

引用次数: 0

AutoRegressive approximations to nonstationary time series with inference and applications 非平稳时间序列的自回归逼近及其推理与应用

The Annals of Statistics

Pub Date : 2023-06-01 DOI: 10.1214/23-aos2288

Xiucai Ding, Zhou Zhou

引用次数: 0

Nonlinear independent component analysis for discrete-time and continuous-time signals 离散和连续信号的非线性独立分量分析

The Annals of Statistics

Pub Date : 2023-04-01 DOI: 10.1214/23-aos2256

A. Schell, Harald Oberhauser

引用次数: 5

Breaking the winner’s curse in Mendelian randomization: Rerandomized inverse variance weighted estimator 打破孟德尔随机化中的赢家诅咒:再随机化逆方差加权估计器

The Annals of Statistics

Pub Date : 2023-02-01 DOI: 10.1214/22-aos2247

Xinwei Ma, Jingshen Wang, Chong Wu

Developments in genome-wide association studies and the increasing availability of summary genetic association data have made the application of two-sample Mendelian Randomization (MR) with summary data increasingly popular. Conventional two-sample MR methods often employ the same sample for selecting relevant genetic variants and for constructing final causal estimates. Such a practice often leads to biased causal effect estimates due to the well known"winner's curse"phenomenon. To address this fundamental challenge, we first examine its consequence on causal effect estimation both theoretically and empirically. We then propose a novel framework that systematically breaks the winner's curse, leading to unbiased association effect estimates for the selected genetic variants. Building upon the proposed framework, we introduce a novel rerandomized inverse variance weighted estimator that is consistent when selection and parameter estimation are conducted on the same sample. Under appropriate conditions, we show that the proposed RIVW estimator for the causal effect converges to a normal distribution asymptotically and its variance can be well estimated. We illustrate the finite-sample performance of our approach through Monte Carlo experiments and two empirical examples.

全基因组关联研究的发展和汇总遗传关联数据的日益可用性使得双样本孟德尔随机化(MR)与汇总数据的应用越来越受欢迎。传统的双样本MR方法通常使用相同的样本来选择相关的遗传变异并构建最终的因果估计。由于众所周知的“赢家的诅咒”现象，这种做法经常导致有偏见的因果效应估计。为了解决这一基本挑战，我们首先从理论上和经验上检查其对因果效应估计的影响。然后，我们提出了一个新的框架，系统地打破了赢家的诅咒，导致对所选遗传变异的无偏关联效应估计。在提出的框架的基础上，我们引入了一种新的再随机化反方差加权估计器，当对同一样本进行选择和参数估计时，该估计器是一致的。在适当的条件下，我们证明了所提出的因果效应的RIVW估计量渐近收敛于正态分布，并且可以很好地估计其方差。我们通过蒙特卡罗实验和两个经验例子说明了我们的方法的有限样本性能。

{"title":"Breaking the winner’s curse in Mendelian randomization: Rerandomized inverse variance weighted estimator","authors":"Xinwei Ma, Jingshen Wang, Chong Wu","doi":"10.1214/22-aos2247","DOIUrl":"https://doi.org/10.1214/22-aos2247","url":null,"abstract":"Developments in genome-wide association studies and the increasing availability of summary genetic association data have made the application of two-sample Mendelian Randomization (MR) with summary data increasingly popular. Conventional two-sample MR methods often employ the same sample for selecting relevant genetic variants and for constructing final causal estimates. Such a practice often leads to biased causal effect estimates due to the well known\"winner's curse\"phenomenon. To address this fundamental challenge, we first examine its consequence on causal effect estimation both theoretically and empirically. We then propose a novel framework that systematically breaks the winner's curse, leading to unbiased association effect estimates for the selected genetic variants. Building upon the proposed framework, we introduce a novel rerandomized inverse variance weighted estimator that is consistent when selection and parameter estimation are conducted on the same sample. Under appropriate conditions, we show that the proposed RIVW estimator for the causal effect converges to a normal distribution asymptotically and its variance can be well estimated. We illustrate the finite-sample performance of our approach through Monte Carlo experiments and two empirical examples.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86399291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sharp global convergence guarantees for iterative nonconvex optimization with random data 全局收敛性保证了随机数据下迭代非凸优化

The Annals of Statistics

Pub Date : 2023-02-01 DOI: 10.1214/22-aos2246

Kabir Chandrasekher, Ashwin Pananjady, Christos Thrampoulidis

We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of ﬁtting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms for this task from a random initialization. In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian comparison theorems to derive a deterministic sequence that provides sharp upper and lower bounds on the error of the algorithm with sample-splitting. Crucially, this deterministic sequence accurately captures both the convergence rate of the algorithm and the eventual error ﬂoor in the ﬁnite-sample regime, and is distinct from the commonly used “population” sequence that results from taking the inﬁnite-sample limit. We apply our general framework to derive several concrete consequences for parameter estimation in popular statistical models including phase retrieval and mixtures of regressions. Provided the sample size scales near-linearly in the dimension, we show sharp global convergence rates for both higher-order algorithms based on alternating updates and ﬁrst-order algorithms based on subgradient descent. These corollaries, in turn, reveal multiple nonstandard phenomena that are then corroborated by extensive numerical experiments.

我们考虑一类具有正态分布协变量的回归模型，以及与之相关的从数据拟合这些模型的非凸问题。我们从随机初始化出发，给出了分析该任务迭代算法收敛性的一般方法。特别是，假设每次迭代都可以写成满足某些自然条件的凸优化问题的解，我们利用高斯比较定理推导出一个确定性序列，该序列通过样本分裂为算法的误差提供了明确的上限和下界。至关重要的是，这种确定性序列准确地捕获了算法的收敛速度和有限样本状态下的最终误差下限，并且与采用无限样本极限的常用“总体”序列不同。我们应用我们的一般框架来推导一些具体的结果参数估计在流行的统计模型，包括相位检索和混合回归。假设样本大小在维度上接近线性，我们展示了基于交替更新的高阶算法和基于次梯度下降的一阶算法的快速全局收敛速度。这些推论，反过来，揭示了多种非标准现象，然后被广泛的数值实验证实。

{"title":"Sharp global convergence guarantees for iterative nonconvex optimization with random data","authors":"Kabir Chandrasekher, Ashwin Pananjady, Christos Thrampoulidis","doi":"10.1214/22-aos2246","DOIUrl":"https://doi.org/10.1214/22-aos2246","url":null,"abstract":"We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of ﬁtting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms for this task from a random initialization. In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying some natural conditions, we leverage Gaussian comparison theorems to derive a deterministic sequence that provides sharp upper and lower bounds on the error of the algorithm with sample-splitting. Crucially, this deterministic sequence accurately captures both the convergence rate of the algorithm and the eventual error ﬂoor in the ﬁnite-sample regime, and is distinct from the commonly used “population” sequence that results from taking the inﬁnite-sample limit. We apply our general framework to derive several concrete consequences for parameter estimation in popular statistical models including phase retrieval and mixtures of regressions. Provided the sample size scales near-linearly in the dimension, we show sharp global convergence rates for both higher-order algorithms based on alternating updates and ﬁrst-order algorithms based on subgradient descent. These corollaries, in turn, reveal multiple nonstandard phenomena that are then corroborated by extensive numerical experiments.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80800901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Sup-norm adaptive drift estimation for multivariate nonreversible diffusions 多元不可逆扩散的超范数自适应漂移估计

The Annals of Statistics

Pub Date : 2022-12-01 DOI: 10.1214/22-aos2237

Cathrine Aeckerle-Willems, C. Strauch

We consider the question of estimating the drift for a large class of ergodic multivariate and possibly nonreversible diffusion processes, based on continuous observations, in sup -norm loss. Nonparametric classes of smooth functions of unknown order are considered, and we suggest an adaptive approach which allows to construct drift estimators attaining optimal sup -norm rates of convergence. Reversibility structures and related functional inequalities are known to be key tools for these estimation problems. We can discard such restrictions by making use of mixing properties which are satisﬁed for the very general class of processes under consideration. Analysing diffusions, the scalar case is very distinct from the general multivariate setting. Therefore, we treat scalar and multivariate processes separately which leads to in several aspects improved univariate results. While we consider drift estimation on bounded domains for exponentially β -mixing multivariate processes, for scalar diffusion processes we work under minimal assumptions that allow estimation of unbounded drift terms over the entire real line, and we provide classical minimax results (including lower bounds) which cannot be obtained under state-of-the-art conditions in the multivariate case. In addition, we prove a Donsker theorem for the classical kernel estimator of the invariant density in the scalar setting and establish its semiparametric efﬁciency.

我们考虑了在超范数损失下，基于连续观测的一大类遍历多元可能不可逆扩散过程的漂移估计问题。考虑了未知阶光滑函数的非参数类，提出了一种自适应方法，该方法允许构造漂移估计量，从而获得最优的上范数收敛速率。已知可逆性结构和相关的函数不等式是这些估计问题的关键工具。我们可以利用所考虑的非常一般的一类过程所满足的混合特性来抛弃这种限制。在分析扩散时，标量情况与一般的多变量情况非常不同。因此，我们将标量过程和多元过程分开处理，从而在几个方面改进了单变量结果。当我们考虑指数β混合多变量过程在有界域上的漂移估计时，对于标量扩散过程，我们在最小假设下工作，允许在整个实线上估计无界漂移项，并且我们提供了在最先进的条件下无法获得的经典极小极大结果(包括下界)在多变量情况下。此外，我们证明了标量集上不变密度的经典核估计量的一个Donsker定理，并建立了它的半参数效率。

{"title":"Sup-norm adaptive drift estimation for multivariate nonreversible diffusions","authors":"Cathrine Aeckerle-Willems, C. Strauch","doi":"10.1214/22-aos2237","DOIUrl":"https://doi.org/10.1214/22-aos2237","url":null,"abstract":"We consider the question of estimating the drift for a large class of ergodic multivariate and possibly nonreversible diffusion processes, based on continuous observations, in sup -norm loss. Nonparametric classes of smooth functions of unknown order are considered, and we suggest an adaptive approach which allows to construct drift estimators attaining optimal sup -norm rates of convergence. Reversibility structures and related functional inequalities are known to be key tools for these estimation problems. We can discard such restrictions by making use of mixing properties which are satisﬁed for the very general class of processes under consideration. Analysing diffusions, the scalar case is very distinct from the general multivariate setting. Therefore, we treat scalar and multivariate processes separately which leads to in several aspects improved univariate results. While we consider drift estimation on bounded domains for exponentially β -mixing multivariate processes, for scalar diffusion processes we work under minimal assumptions that allow estimation of unbounded drift terms over the entire real line, and we provide classical minimax results (including lower bounds) which cannot be obtained under state-of-the-art conditions in the multivariate case. In addition, we prove a Donsker theorem for the classical kernel estimator of the invariant density in the scalar setting and establish its semiparametric efﬁciency.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87453902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Choosing between persistent and stationary volatility 在持续波动和平稳波动之间进行选择

The Annals of Statistics

Pub Date : 2022-12-01 DOI: 10.1214/22-aos2236

Ilias Chronopoulos, L. Giraitis, G. Kapetanios

This paper suggests a multiplicative volatility model where volatility is decomposed into a stationary and a non-stationary persistent part. We provide a testing procedure to determine which type of volatility is prevalent in the data. The persistent part of volatility is associated with a nonstationary persistent process satisfying some smoothness and moment conditions. The stationary part is related to stationary conditional heteroskedasticity. We outline theory and conditions that allow the extraction of the persistent part from the data and enable standard conditional heteroskedasticity tests to detect stationary volatility after persistent volatility is taken into account. Monte Carlo results support the testing strategy in small samples. The empirical application of the theory supports the persistent volatility paradigm, suggesting that stationary conditional heteroskedasticity is considerably less pronounced than previously thought.

本文提出了一种乘法波动率模型，其中波动率分解为平稳部分和非平稳持续部分。我们提供了一个测试程序来确定哪种类型的波动性在数据中普遍存在。波动性的持续部分是一个满足一定平滑和矩条件的非平稳持续过程。平稳部分与平稳条件异方差有关。我们概述了允许从数据中提取持续部分的理论和条件，并使标准条件异方差检验能够在考虑持续波动后检测平稳波动。蒙特卡罗结果支持小样本的测试策略。该理论的实证应用支持持续波动范式，表明平稳条件异方差远没有以前认为的那么明显。

引用次数: 1

Coverage of credible intervals in Bayesian multivariate isotonic regression 贝叶斯多元等渗回归中可信区间的覆盖

The Annals of Statistics

Pub Date : 2022-11-22 DOI: 10.1214/23-aos2298

Kangkang Wang, S. Ghosal

We consider the nonparametric multivariate isotonic regression problem, where the regression function is assumed to be nondecreasing with respect to each predictor. Our goal is to construct a Bayesian credible interval for the function value at a given interior point with assured limiting frequentist coverage. We put a prior on unrestricted step-functions, but make inference using the induced posterior measure by an"immersion map"from the space of unrestricted functions to that of multivariate monotone functions. This allows maintaining the natural conjugacy for posterior sampling. A natural immersion map to use is a projection via a distance, but in the present context, a block isotonization map is found to be more useful. The approach of using the induced"immersion posterior"measure instead of the original posterior to make inference provides a useful extension of the Bayesian paradigm, particularly helpful when the model space is restricted by some complex relations. We establish a key weak convergence result for the posterior distribution of the function at a point in terms of some functional of a multi-indexed Gaussian process that leads to an expression for the limiting coverage of the Bayesian credible interval. Analogous to a recent result for univariate monotone functions, we find that the limiting coverage is slightly higher than the credibility, the opposite of a phenomenon observed in smoothing problems. Interestingly, the relation between credibility and limiting coverage does not involve any unknown parameter. Hence by a recalibration procedure, we can get a predetermined asymptotic coverage by choosing a suitable credibility level smaller than the targeted coverage, and thus also shorten the credible intervals.

我们考虑非参数多元等渗回归问题，其中回归函数被假设为相对于每个预测因子的非递减。我们的目标是构造函数值在给定的限定频率覆盖点处的贝叶斯可信区间。我们对不受限制的阶跃函数设置了先验，但通过从不受限制的函数空间到多元单调函数空间的“浸入映射”，利用诱导后验测度进行推理。这允许后验抽样保持自然共轭。自然的沉浸式地图是通过距离的投影，但在目前的情况下，块等同化地图被认为更有用。利用诱导的“浸入后验”测度代替原始后验测度进行推理的方法是对贝叶斯范式的有益扩展，尤其在模型空间受到一些复杂关系限制的情况下非常有用。我们用多指标高斯过程的某些泛函建立了函数在某点的后验分布的一个关键弱收敛结果，从而得到贝叶斯可信区间的极限覆盖表达式。与最近的单变量单调函数的结果类似，我们发现极限覆盖略高于可信度，这与平滑问题中观察到的现象相反。有趣的是，可信度和极限覆盖率之间的关系不涉及任何未知参数。因此，通过重新校准过程，我们可以通过选择一个比目标覆盖率小的合适的可信度水平来获得预定的渐近覆盖率，从而也缩短了可信区间。

{"title":"Coverage of credible intervals in Bayesian multivariate isotonic regression","authors":"Kangkang Wang, S. Ghosal","doi":"10.1214/23-aos2298","DOIUrl":"https://doi.org/10.1214/23-aos2298","url":null,"abstract":"We consider the nonparametric multivariate isotonic regression problem, where the regression function is assumed to be nondecreasing with respect to each predictor. Our goal is to construct a Bayesian credible interval for the function value at a given interior point with assured limiting frequentist coverage. We put a prior on unrestricted step-functions, but make inference using the induced posterior measure by an\"immersion map\"from the space of unrestricted functions to that of multivariate monotone functions. This allows maintaining the natural conjugacy for posterior sampling. A natural immersion map to use is a projection via a distance, but in the present context, a block isotonization map is found to be more useful. The approach of using the induced\"immersion posterior\"measure instead of the original posterior to make inference provides a useful extension of the Bayesian paradigm, particularly helpful when the model space is restricted by some complex relations. We establish a key weak convergence result for the posterior distribution of the function at a point in terms of some functional of a multi-indexed Gaussian process that leads to an expression for the limiting coverage of the Bayesian credible interval. Analogous to a recent result for univariate monotone functions, we find that the limiting coverage is slightly higher than the credibility, the opposite of a phenomenon observed in smoothing problems. Interestingly, the relation between credibility and limiting coverage does not involve any unknown parameter. Hence by a recalibration procedure, we can get a predetermined asymptotic coverage by choosing a suitable credibility level smaller than the targeted coverage, and thus also shorten the credible intervals.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"152 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77051651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Optimal Permutation Estimation in CrowdSourcing problems 众包问题中的最优排列估计

The Annals of Statistics

Pub Date : 2022-11-08 DOI: 10.1214/23-aos2271

Emmanuel Pilliat, A. Carpentier, N. Verzelen

Motivated by crowd-sourcing applications, we consider a model where we have partial observations from a bivariate isotonic n x d matrix with an unknown permutation $pi$ * acting on its rows. Focusing on the twin problems of recovering the permutation $pi$ * and estimating the unknown matrix, we introduce a polynomial-time procedure achieving the minimax risk for these two problems, this for all possible values of n, d, and all possible sampling efforts. Along the way, we establish that, in some regimes, recovering the unknown permutation $pi$ * is considerably simpler than estimating the matrix.

受众包应用程序的激励，我们考虑一个模型，其中我们有来自二元等渗n x d矩阵的部分观测值，该矩阵具有未知排列$pi$ *作用于其行。关注恢复排列$pi$ *和估计未知矩阵的孪生问题，我们引入了一个多项式时间过程来实现这两个问题的最小最大风险，这适用于所有可能的n, d值和所有可能的采样努力。在此过程中，我们建立了，在某些情况下，恢复未知的排列$pi$ *比估计矩阵要简单得多。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The Annals of Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀