{"title":"通过广义方差分解对不完整数据进行高效估计","authors":"Thomas B. Berrett","doi":"arxiv-2409.05729","DOIUrl":null,"url":null,"abstract":"We study the efficient estimation of a class of mean functionals in settings\nwhere a complete multivariate dataset is complemented by additional datasets\nrecording subsets of the variables of interest. These datasets are allowed to\nhave a general, in particular non-monotonic, structure. Our main contribution\nis to characterise the asymptotic minimal mean squared error for these problems\nand to introduce an estimator whose risk approximately matches this lower\nbound. We show that the efficient rescaled variance can be expressed as the\nminimal value of a quadratic optimisation problem over a function space, thus\nestablishing a fundamental link between these estimation problems and the\ntheory of generalised ANOVA decompositions. Our estimation procedure uses\niterated nonparametric regression to mimic an approximate influence function\nderived through gradient descent. We prove that this estimator is approximately\nnormally distributed, provide an estimator of its variance and thus develop\nconfidence intervals of asymptotically minimal width. Finally we study a more\ndirect estimator, which can be seen as a U-statistic with a data-dependent\nkernel, showing that it is also efficient under stronger regularity conditions.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"396 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient estimation with incomplete data via generalised ANOVA decomposition\",\"authors\":\"Thomas B. Berrett\",\"doi\":\"arxiv-2409.05729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the efficient estimation of a class of mean functionals in settings\\nwhere a complete multivariate dataset is complemented by additional datasets\\nrecording subsets of the variables of interest. These datasets are allowed to\\nhave a general, in particular non-monotonic, structure. Our main contribution\\nis to characterise the asymptotic minimal mean squared error for these problems\\nand to introduce an estimator whose risk approximately matches this lower\\nbound. We show that the efficient rescaled variance can be expressed as the\\nminimal value of a quadratic optimisation problem over a function space, thus\\nestablishing a fundamental link between these estimation problems and the\\ntheory of generalised ANOVA decompositions. Our estimation procedure uses\\niterated nonparametric regression to mimic an approximate influence function\\nderived through gradient descent. We prove that this estimator is approximately\\nnormally distributed, provide an estimator of its variance and thus develop\\nconfidence intervals of asymptotically minimal width. Finally we study a more\\ndirect estimator, which can be seen as a U-statistic with a data-dependent\\nkernel, showing that it is also efficient under stronger regularity conditions.\",\"PeriodicalId\":501379,\"journal\":{\"name\":\"arXiv - STAT - Statistics Theory\",\"volume\":\"396 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05729\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
我们研究的是在一个完整的多元数据集的基础上,通过记录相关变量子集的附加数据集,对一类均值函数进行有效估计的问题。允许这些数据集具有一般结构,特别是非单调结构。我们的主要贡献在于描述了这些问题的渐近最小均方误差,并引入了一种风险与该下限近似匹配的估计器。我们证明,有效的重标方差可以表示为函数空间上二次优化问题的最小值,从而在这些估计问题和广义方差分解理论之间建立了基本联系。我们的估算程序使用迭代非参数回归来模拟通过梯度下降得到的近似影响函数。我们证明了该估计值近似正态分布,提供了其方差的估计值,从而得出了渐近最小宽度的置信区间。最后,我们研究了一种更直接的估计器,它可以看作是具有数据依赖核的 U 统计量,并表明在更强的正则性条件下它也是有效的。
Efficient estimation with incomplete data via generalised ANOVA decomposition
We study the efficient estimation of a class of mean functionals in settings
where a complete multivariate dataset is complemented by additional datasets
recording subsets of the variables of interest. These datasets are allowed to
have a general, in particular non-monotonic, structure. Our main contribution
is to characterise the asymptotic minimal mean squared error for these problems
and to introduce an estimator whose risk approximately matches this lower
bound. We show that the efficient rescaled variance can be expressed as the
minimal value of a quadratic optimisation problem over a function space, thus
establishing a fundamental link between these estimation problems and the
theory of generalised ANOVA decompositions. Our estimation procedure uses
iterated nonparametric regression to mimic an approximate influence function
derived through gradient descent. We prove that this estimator is approximately
normally distributed, provide an estimator of its variance and thus develop
confidence intervals of asymptotically minimal width. Finally we study a more
direct estimator, which can be seen as a U-statistic with a data-dependent
kernel, showing that it is also efficient under stronger regularity conditions.