Annals of Statistics最新文献

英文中文

AVERAGE TREATMENT EFFECTS IN THE PRESENCE OF UNKNOWN INTERFERENCE. 在存在未知干扰的情况下的平均治疗效果。

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-04-01 Epub Date: 2021-04-02 DOI: 10.1214/20-aos1973

Fredrik Sävje, Peter Aronow, Michael Hudgens

We investigate large-sample properties of treatment effect estimators under unknown interference in randomized experiments. The inferential target is a generalization of the average treatment effect estimand that marginalizes over potential spillover effects. We show that estimators commonly used to estimate treatment effects under no interference are consistent for the generalized estimand for several common experimental designs under limited but otherwise arbitrary and unknown interference. The rates of convergence depend on the rate at which the amount of interference grows and the degree to which it aligns with dependencies in treatment assignment. Importantly for practitioners, the results imply that if one erroneously assumes that units do not interfere in a setting with limited, or even moderate, interference, standard estimators are nevertheless likely to be close to an average treatment effect if the sample is sufficiently large. Conventional confidence statements may, however, not be accurate.

我们研究了随机实验中未知干扰下治疗效果估计值的大样本特性。推论目标是平均治疗效果估计值的广义化，它将潜在的溢出效应边际化。我们的研究表明，在无干扰情况下常用于估计治疗效果的估计值，在有限但任意且未知的干扰情况下，对于几种常见的实验设计，其广义估计值是一致的。收敛速度取决于干扰量的增长速度以及干扰量与治疗分配依赖性的一致程度。对于实践者来说，重要的是，研究结果表明，如果错误地假定单位在有限甚至中等程度的干扰下不会产生干扰，那么如果样本足够大，标准估计值很可能接近平均治疗效果。然而，传统的置信度声明可能并不准确。

引用次数: 0

Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions 高维LASSO变量选择一致性的充分必要条件

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-04-01 DOI: 10.1214/20-AOS1979

S. Lahiri

This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of Zhao and Yu (2006), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (Zhao and Yu, 2006). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters can not be √ nconsistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter can not achieve both variable selection consistency and √ n-consistency simultaneously. MSC 2010 subject classifications: Primary62E20; secondary 62J05.

本文研究了高维回归模型中LASSO变量选择一致性的条件，并给出了必要和充分条件，可能允许模型维数p作为样本量n的函数任意快速增长。这些条件需要惩罚参数增长率的上界和下界。结果表明，Zhao和Yu(2006)的不可表征条件(IRC)的一个变体，这里称为下不可表征条件(LIRC)，由下界考虑决定，而上界考虑导致一个新的条件，本文称为上不可表征条件(UIRC)。证明了lrc和UIRC对于LASSO的变量选择一致性是充分必要的，从而解决了(Zhao and Yu, 2006)的猜想。进一步证明了在一些温和的正则性条件下，惩罚参数必须以一定的最小速率趋于无穷大，以保证LASSO的变量选择一致性，并且相应的非零回归参数的LASSO估计量不能是不一致的(即使是单个参数)。因此，在相当一般的条件下，惩罚参数选择单一的LASSO不能同时实现变量选择一致性和√n一致性。MSC 2010学科分类:Primary62E20;二次62 j05。

{"title":"Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions","authors":"S. Lahiri","doi":"10.1214/20-AOS1979","DOIUrl":"https://doi.org/10.1214/20-AOS1979","url":null,"abstract":"This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of Zhao and Yu (2006), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (Zhao and Yu, 2006). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters can not be √ nconsistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter can not achieve both variable selection consistency and √ n-consistency simultaneously. MSC 2010 subject classifications: Primary62E20; secondary 62J05.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":" ","pages":""},"PeriodicalIF":4.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43542740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

ASYMMETRY HELPS: EIGENVALUE AND EIGENVECTOR ANALYSES OF ASYMMETRICALLY PERTURBED LOW-RANK MATRICES. 不对称有助于：不对称扰动低阶矩阵的特征值和特征向量分析。

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 Epub Date: 2021-01-29 DOI: 10.1214/20-aos1963

Yuxin Chen, Chen Cheng, Jianqing Fan

This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix $M^{⋆} \in ℝ^{n \times n}$ , yet only a randomly perturbed version M is observed. The noise matrix M - M ^⋆ is composed of independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise if, for example, we have two independent samples for each entry of M ^⋆ and arrange them in an asymmetric fashion. The aim is to estimate the leading eigenvalue and the leading eigenvector of M ^⋆. We demonstrate that the leading eigenvalue of the data matrix M can be $O (\sqrt{n})$ times more accurate (up to some log factor) than its (unadjusted) leading singular value of M in eigenvalue estimation. Moreover, the eigen-decomposition approach is fully adaptive to heteroscedasticity of noise, without the need of any prior knowledge about the noise distributions. In a nutshell, this curious phenomenon arises since the statistical asymmetry automatically mitigates the bias of the eigenvalue approach, thus eliminating the need of careful bias correction. Additionally, we develop appealing non-asymptotic eigenvector perturbation bounds; in particular, we are able to bound the perterbation of any linear function of the leading eigenvector of M (e.g. entrywise eigenvector perturbation). We also provide partial theory for the more general rank-r case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be quite beneficial.

本文关注统计不对称与光谱方法之间的相互作用。假设我们有兴趣估算一个秩为 1 的对称矩阵 M ⋆ ∈ ℝ n × n，但只观测到随机扰动版本的 M。噪声矩阵 M - M ⋆ 由独立（但不一定是同源）条目组成，因此一般不是对称的。例如，如果我们对 M ⋆ 的每个条目都有两个独立样本，并以非对称方式排列，就可能出现这种情况。我们的目的是估计 M ⋆ 的前导特征值和前导特征向量。我们证明，在特征值估计中，数据矩阵 M 的前导特征值比 M 的（未调整的）前导奇异值精确 O ( n ) 倍（达到某个对数因子）。此外，特征分解方法还能完全适应噪声的异方差性，而无需任何关于噪声分布的先验知识。简而言之，这种奇特现象的出现是因为统计不对称自动减轻了特征值方法的偏差，从而无需进行仔细的偏差校正。此外，我们还开发了具有吸引力的非渐近特征向量扰动约束；特别是，我们能够约束 M 的前导特征向量的任何线性函数的扰动（例如入口特征向量扰动）。我们还为更一般的秩r情况提供了部分理论。我们的启示是：以非对称方式排列数据样本并进行特征分解有时会非常有益。

{"title":"ASYMMETRY HELPS: EIGENVALUE AND EIGENVECTOR ANALYSES OF ASYMMETRICALLY PERTURBED LOW-RANK MATRICES.","authors":"Yuxin Chen, Chen Cheng, Jianqing Fan","doi":"10.1214/20-aos1963","DOIUrl":"10.1214/20-aos1963","url":null,"abstract":"This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix <math> <mrow> <msup><mstyle><mi>M</mi></mstyle> <mo>⋆</mo></msup> <mo>∈</mo> <msup><mi>ℝ</mi> <mrow><mi>n</mi> <mo>×</mo> <mi>n</mi></mrow> </msup> </mrow> </math> , yet only a randomly perturbed version M is observed. The noise matrix M - M ⋆ is composed of independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise if, for example, we have two independent samples for each entry of M ⋆ and arrange them in an asymmetric fashion. The aim is to estimate the leading eigenvalue and the leading eigenvector of M ⋆. We demonstrate that the leading eigenvalue of the data matrix M can be <math><mrow><mi>O</mi> <mo>(</mo> <msqrt><mi>n</mi></msqrt> <mo>)</mo></mrow> </math> times more accurate (up to some log factor) than its (unadjusted) leading singular value of M in eigenvalue estimation. Moreover, the eigen-decomposition approach is fully adaptive to heteroscedasticity of noise, without the need of any prior knowledge about the noise distributions. In a nutshell, this curious phenomenon arises since the statistical asymmetry automatically mitigates the bias of the eigenvalue approach, thus eliminating the need of careful bias correction. Additionally, we develop appealing non-asymptotic eigenvector perturbation bounds; in particular, we are able to bound the perterbation of any linear function of the leading eigenvector of M (e.g. entrywise eigenvector perturbation). We also provide partial theory for the more general rank-r case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be quite beneficial.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"49 1","pages":"435-458"},"PeriodicalIF":4.5,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8300484/pdf/nihms-1639565.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39218981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A rule of thumb: Run lengths to false alarm of many types of control charts run in parallel on dependent streams are asymptotically independent 经验法则：在相关流上并行运行的许多类型的控制图的虚警运行长度是渐近独立的

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 DOI: 10.1214/20-AOS1968

M. Pollak

Consider a process that produces a series of independent identically distributed vectors. A change in an underlying state may become manifest in a modification of one or more of the marginal distributions. Often, the dependence structure between coordinates is unknown, impeding surveillance based on the joint distribution. A popular approach is to construct control charts for each coordinate separately and raise an alarm the first time any (or some) of the control charts signals. The difficulty is obtaining an expression for the overall average run length to false alarm (ARL2FA).We argue that despite the dependence structure, when the process is in control, for large ARLs to false alarm, run lengths of many types of control charts run in parallel are asymptotically independent. Furthermore, often, in-control run lengths are asymptotically exponentially distributed, enabling uncomplicated asymptotic expressions for the ARL2FA.We prove this assertion for certain Cusum and Shiryaev–Roberts-type control charts and illustrate it by simulations.

考虑一个生成一系列独立的同分布向量的过程。潜在状态的变化可能在一个或多个边际分布的修改中变得明显。通常，坐标之间的依赖结构是未知的，阻碍了基于联合分布的监视。一种流行的方法是分别为每个坐标构建控制图，并在任何（或某些）控制图发出信号时发出警报。困难在于获得到虚警的总平均游程长度（ARL2FA）的表达式。我们认为，尽管存在依赖结构，但当过程处于控制状态时，对于大的ARL到虚警，许多类型的控制图并行运行的游程长度是渐近独立的。此外，通常情况下，控制行程长度是渐近指数分布的，这使得ARL2FA能够得到简单的渐近表达式。我们在某些Cusum和Shiryaev–Roberts型控制图上证明了这一断言，并通过仿真进行了说明。

引用次数: 0

Wordlength enumerator for fractional factorial designs 分数阶乘设计的字长枚举器

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 DOI: 10.1214/20-AOS1955

Yu Tang, Hongquan Xu

While the minimum aberration criterion is popular for selecting good designs with qualitative factors under an ANOVA model, the minimum $beta$-aberration criterion is more suitable for selecting designs with quantitative factors under a polynomial model. In this paper, we propose the concept of wordlength enumerator to unify these two criteria. The wordlength enumerator is defined as an average similarity of contrasts among all possible pairs of runs. The wordlength enumerator is easy and fast to compute, and can be used to compare and rank designs efficiently. Based on the wordlength enumerator, we develop simple and fast methods for calculating both the generalized wordlength pattern and the $beta$-wordlength pattern. We further obtain a lower bound of the wordlength enumerator for three-level designs and characterize the combinatorial structure of designs achieving the lower bound. Finally, we propose two methods for constructing supersaturated designs that have both generalized minimum aberration and minimum $beta$-aberration.

虽然最小像差标准在ANOVA模型下适用于选择具有定性因素的良好设计，但最小$beta$-像差标准更适用于在多项式模型下选择具有定量因素的设计。在本文中，我们提出了单词长度枚举器的概念来统一这两个标准。字长枚举器被定义为所有可能的运行对之间的对比度的平均相似性。字长枚举器计算简单快捷，可用于有效地比较和排序设计。基于字长枚举器，我们开发了简单快速的方法来计算广义字长模式和$beta$字长模式。我们进一步获得了三级设计的字长枚举器的下界，并描述了实现该下界的设计的组合结构。最后，我们提出了两种构造过饱和设计的方法，这两种方法同时具有广义最小像差和最小$beta$-像差。

引用次数: 4

Correction note: “Optimal two-stage procedures for estimating location and size of the maximum of a multivariate regression function” Ann. Statist. 40 (2012) 2850–2876 更正说明：“估计多元回归函数最大值的位置和大小的最佳两阶段程序”Ann.Statist。40（2012）2850–2876

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 DOI: 10.1214/20-AOS1993

E. Belitser, S. Ghosal, H. Zanten

We rectify a wrongly stated fact in the paper of Belitser, Ghosal and van Zanten (Ann. Statist.40 (2012) 2850–2876).

我们纠正了Belitser, Ghosal和van Zanten (Ann)的论文中错误陈述的事实。统计。40(2012)2850-2876)。

引用次数: 1

Correction note: Higher order elicitability and Osband’s principle 更正注释：高阶启发性和Osband原理

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 DOI: 10.1214/20-AOS2014

Tobias Fissler, Johanna F. Ziegel

引用次数: 3

Convergence of covariance and spectral density estimates for high-dimensional locally stationary processes 高维局部平稳过程的协方差和谱密度估计的收敛性

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 DOI: 10.1214/20-AOS1954

Danna Zhang, W. Wu

Covariances and spectral density functions play a fundamental role in the theory of time series. There is a well-developed asymptotic theory for their estimates for low-dimensional stationary processes. For high-dimensional nonstationary processes, however, many important problems on their asymptotic behaviors are still unanswered. This paper presents a systematic asymptotic theory for the estimates of time-varying second-order statistics for a general class of high-dimensional locally stationary processes. Using the framework of functional dependence measure, we derive convergence rates of the estimates which depend on the sample size $T$, the dimension $p$, the moment condition and the dependence of the underlying processes.

协方差和谱密度函数在时间序列理论中起着重要作用。对于低维平稳过程的估计，有一个完善的渐近理论。然而，对于高维非平稳过程，关于其渐近行为的许多重要问题仍然没有得到解答。本文给出了一类高维局部平稳过程时变二阶统计量估计的系统渐近理论。利用函数依赖性测度的框架，我们导出了估计的收敛率，这些估计取决于样本大小$T$、维度$p$、矩条件和基本过程的依赖性。

引用次数: 22

ASYMPTOTICALLY INDEPENDENT U-STATISTICS IN HIGH-DIMENSIONAL TESTING. 高维测试中渐近独立的 u 统计量。

IF 3.2 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 Epub Date: 2021-01-29 DOI: 10.1214/20-aos1951

Yinqiu He, Gongjun Xu, Chong Wu, Wei Pan

Many high-dimensional hypothesis tests aim to globally examine marginal or low-dimensional features of a high-dimensional joint distribution, such as testing of mean vectors, covariance matrices and regression coefficients. This paper constructs a family of U-statistics as unbiased estimators of the ℓ _p -norms of those features. We show that under the null hypothesis, the U-statistics of different finite orders are asymptotically independent and normally distributed. Moreover, they are also asymptotically independent with the maximum-type test statistic, whose limiting distribution is an extreme value distribution. Based on the asymptotic independence property, we propose an adaptive testing procedure which combines p-values computed from the U-statistics of different orders. We further establish power analysis results and show that the proposed adaptive procedure maintains high power against various alternatives.

许多高维假设检验旨在全面检验高维联合分布的边际或低维特征，如检验均值向量、协方差矩阵和回归系数。本文构建了一系列 U 统计量，作为这些特征的 ℓ p 矩的无偏估计值。我们证明，在零假设下，不同有限阶的 U 统计量是渐近独立和正态分布的。此外，它们与最大类型检验统计量也是渐近独立的，最大类型检验统计量的极限分布是极值分布。基于渐近独立特性，我们提出了一种自适应检验程序，该程序结合了从不同阶的 U 统计量计算出的 p 值。我们进一步建立了功率分析结果，并表明所提出的自适应程序在面对各种替代方案时都能保持较高的功率。

引用次数: 0

Sharp minimax distribution estimation for current status censoring with or without missing 有或没有缺失的当前状态审查的尖锐极小最大值分布估计

IF 4.5 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics

Pub Date : 2021-02-01 DOI: 10.1214/20-AOS1970

S. Efromovich

Nonparametric estimation of the cumulative distribution function and the probability density of a lifetime X modified by a current status censoring (CSC), including cases of right and left missing data, is a classical ill-posed problem with biased data. The biased nature of CSC data may preclude us from consistent estimation unless the biasing function is known or may be estimated, and its ill-posed nature slows down rates of convergence. Under a traditionally studied CSC, we observe a sample from $(Z,Delta )$ where a continuous monitoring time $Z$ is independent of $X$, $Delta :=I(Xleq Z)$ is the status, and the bias of observations is created by the density of $Z$ which is estimable. In presence of right or left missing, we observe corresponding samples from $(Delta Z,Delta )$ or $((1-Delta )Z,Delta )$; the data are again biased but now the density of $Z$ cannot be estimated from the data. As a result, to solve the estimation problem, either the density of $Z$ must be known (like in a controlled study) or an extra cross-sectional sampling of $Z$, which is typically simpler than an underlying CSC study, be conducted. The main aim of the paper is to develop for this biased and ill-posed problem the theory of efficient (sharp-minimax) estimation which is inspired by known results for the case of directly observed $X$. Among interesting aspects of the developed theory: (i) While sharp-minimax analysis of missing CSC may follow the classical Pinsker’s methodology, analysis of CSC requires a more complicated estimation procedure based on a special smoothing in both frequency and time domains; (ii) Efficient estimation requires solving an old-standing problem of approximating aperiodic Sobolev functions; (iii) If smoothness of the cdf of $X$ is known, then its rate-minimax estimation is possible even if the density of $Z$ is rougher. Real and simulated examples, as well as extensions of the core models to dependent $X$ and Z and case-control CSC, are presented.

通过当前状态截尾（CSC）修改的寿命X的累积分布函数和概率密度的非参数估计，包括左右缺失数据的情况，是一个具有偏差数据的经典不适定问题。CSC数据的偏差性质可能会使我们无法进行一致的估计，除非偏差函数是已知的或可以估计的，并且其不适定性质会减慢收敛速度。在传统研究的CSC下，我们观察到来自$（Z，Delta）$的样本，其中连续监测时间$Z$独立于$X$，$Delta：=I（Xleq Z）$是状态，并且观测的偏差由$Z$的密度产生，这是可估计的。在存在右或左缺失的情况下，我们观察到来自$（Delta Z，Delta）$或$（（1-Delta（Z，Deleta））$的相应样本；数据再次有偏差，但现在不能根据数据估计$Z$的密度。因此，为了解决估计问题，必须知道$Z$的密度（就像在对照研究中一样），或者进行额外的$Z$横截面抽样，这通常比基础CSC研究更简单。本文的主要目的是针对这一有偏和不适定性问题发展有效（尖锐极小极大）估计理论，该理论受到直接观测$X$情况下已知结果的启发。在所发展的理论的有趣方面中：（i）虽然缺失CSC的尖锐极小极大分析可能遵循经典的Pinsker方法，但CSC的分析需要基于频域和时域中的特殊平滑的更复杂的估计过程；（ii）有效的估计需要解决近似非周期Sobolev函数的老问题；（iii）如果$X$的cdf的光滑性是已知的，则即使$Z$的密度更粗糙，其速率最小最大估计也是可能的。给出了真实和模拟的例子，以及将核心模型扩展到依赖$X$和Z以及病例对照CSC。

{"title":"Sharp minimax distribution estimation for current status censoring with or without missing","authors":"S. Efromovich","doi":"10.1214/20-AOS1970","DOIUrl":"https://doi.org/10.1214/20-AOS1970","url":null,"abstract":"Nonparametric estimation of the cumulative distribution function and the probability density of a lifetime X modified by a current status censoring (CSC), including cases of right and left missing data, is a classical ill-posed problem with biased data. The biased nature of CSC data may preclude us from consistent estimation unless the biasing function is known or may be estimated, and its ill-posed nature slows down rates of convergence. Under a traditionally studied CSC, we observe a sample from $(Z,Delta )$ where a continuous monitoring time $Z$ is independent of $X$, $Delta :=I(Xleq Z)$ is the status, and the bias of observations is created by the density of $Z$ which is estimable. In presence of right or left missing, we observe corresponding samples from $(Delta Z,Delta )$ or $((1-Delta )Z,Delta )$; the data are again biased but now the density of $Z$ cannot be estimated from the data. As a result, to solve the estimation problem, either the density of $Z$ must be known (like in a controlled study) or an extra cross-sectional sampling of $Z$, which is typically simpler than an underlying CSC study, be conducted. The main aim of the paper is to develop for this biased and ill-posed problem the theory of efficient (sharp-minimax) estimation which is inspired by known results for the case of directly observed $X$. Among interesting aspects of the developed theory: (i) While sharp-minimax analysis of missing CSC may follow the classical Pinsker’s methodology, analysis of CSC requires a more complicated estimation procedure based on a special smoothing in both frequency and time domains; (ii) Efficient estimation requires solving an old-standing problem of approximating aperiodic Sobolev functions; (iii) If smoothness of the cdf of $X$ is known, then its rate-minimax estimation is possible even if the density of $Z$ is rougher. Real and simulated examples, as well as extensions of the core models to dependent $X$ and Z and case-control CSC, are presented.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"49 1","pages":"568-589"},"PeriodicalIF":4.5,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49238602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Annals of Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀