{"title":"Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances","authors":"Spencer Compton, Gregory Valiant","doi":"arxiv-2312.02417","DOIUrl":null,"url":null,"abstract":"Given data drawn from a collection of Gaussian variables with a common mean\nbut different and unknown variances, what is the best algorithm for estimating\ntheir common mean? We present an intuitive and efficient algorithm for this\ntask. As different closed-form guarantees can be hard to compare, the\nSubset-of-Signals model serves as a benchmark for heteroskedastic mean\nestimation: given $n$ Gaussian variables with an unknown subset of $m$\nvariables having variance bounded by 1, what is the optimal estimation error as\na function of $n$ and $m$? Our algorithm resolves this open question up to\nlogarithmic factors, improving upon the previous best known estimation error by\npolynomial factors when $m = n^c$ for all $0<c<1$. Of particular note, we\nobtain error $o(1)$ with $m = \\tilde{O}(n^{1/4})$ variance-bounded samples,\nwhereas previous work required $m = \\tilde{\\Omega}(n^{1/2})$. Finally, we show\nthat in the multi-dimensional setting, even for $d=2$, our techniques enable\nrates comparable to knowing the variance of each sample.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"87 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.02417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Given data drawn from a collection of Gaussian variables with a common mean
but different and unknown variances, what is the best algorithm for estimating
their common mean? We present an intuitive and efficient algorithm for this
task. As different closed-form guarantees can be hard to compare, the
Subset-of-Signals model serves as a benchmark for heteroskedastic mean
estimation: given $n$ Gaussian variables with an unknown subset of $m$
variables having variance bounded by 1, what is the optimal estimation error as
a function of $n$ and $m$? Our algorithm resolves this open question up to
logarithmic factors, improving upon the previous best known estimation error by
polynomial factors when $m = n^c$ for all $0