The Power of Linear Estimators

2011 IEEE 52nd Annual Symposium on Foundations of Computer Science Pub Date : 2011-10-22 DOI:10.1109/FOCS.2011.81

G. Valiant, Paul Valiant

{"title":"The Power of Linear Estimators","authors":"G. Valiant, Paul Valiant","doi":"10.1109/FOCS.2011.81","DOIUrl":null,"url":null,"abstract":"For a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statistics -- the number of domain elements seen once, twice, etc. in the sample -- and output the dot product between these summary statistics, and a fixed vector of coefficients. We term such estimators \\emph{linear}. This historical proclivity towards linear estimators is slightly perplexing, since, despite many efforts over nearly 60 years, all proposed such estimators have significantly sub optimal convergence, compared to the bounds shown in [VV11]. Our main result, in some sense vindicating this insistence on linear estimators, is that for any property in this broad class, there exists a near-optimal linear estimator. Additionally, we give a practical and polynomial-time algorithm for constructing such estimators for any given parameters. While this result does not yield explicit bounds on the sample complexities of these estimation tasks, we leverage the insights provided by this result to give explicit constructions of near-optimal linear estimators for three properties: entropy, $L_1$ distance to uniformity, and for pairs of distributions, $L_1$ distance. Our entropy estimator, when given $O(\\frac{n}{\\eps \\log n})$ independent samples from a distribution of support at most $n,$ will estimate the entropy of the distribution to within additive accuracy $\\epsilon$, with probability of failure $o(1/poly(n)).$ From the recent lower bounds given in [VV11], this estimator is optimal, to constant factor, both in its dependence on $n$, and its dependence on $\\epsilon.$ In particular, the inverse-linear convergence rate of this estimator resolves the main open question of [VV11], which left open the possibility that the error decreased only with the square root of the number of samples. Our distance to uniformity estimator, when given $O(\\frac{m}{\\eps^2\\log m})$ independent samples from any distribution, returns an $\\eps$-accurate estimate of the $L_1$ distance to the uniform distribution of support $m$. This is constant-factor optimal, for constant $\\epsilon$. Finally, our framework extends naturally to properties of pairs of distributions, including estimating the $L_1$ distance and KL-divergence between pairs of distributions. We give an explicit linear estimator for estimating $L_1$ distance to additive accuracy $\\epsilon$ using $O(\\frac{n}{\\eps^2\\log n})$ samples from each distribution, which is constant-factor optimal, for constant $\\epsilon$. This is the first sub linear-sample estimator for this fundamental property.","PeriodicalId":326048,"journal":{"name":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"148","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2011.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 148

Abstract

For a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statistics -- the number of domain elements seen once, twice, etc. in the sample -- and output the dot product between these summary statistics, and a fixed vector of coefficients. We term such estimators \emph{linear}. This historical proclivity towards linear estimators is slightly perplexing, since, despite many efforts over nearly 60 years, all proposed such estimators have significantly sub optimal convergence, compared to the bounds shown in [VV11]. Our main result, in some sense vindicating this insistence on linear estimators, is that for any property in this broad class, there exists a near-optimal linear estimator. Additionally, we give a practical and polynomial-time algorithm for constructing such estimators for any given parameters. While this result does not yield explicit bounds on the sample complexities of these estimation tasks, we leverage the insights provided by this result to give explicit constructions of near-optimal linear estimators for three properties: entropy, $L_1$ distance to uniformity, and for pairs of distributions, $L_1$ distance. Our entropy estimator, when given $O(\frac{n}{\eps \log n})$ independent samples from a distribution of support at most $n,$ will estimate the entropy of the distribution to within additive accuracy $\epsilon$, with probability of failure $o(1/poly(n)).$ From the recent lower bounds given in [VV11], this estimator is optimal, to constant factor, both in its dependence on $n$, and its dependence on $\epsilon.$ In particular, the inverse-linear convergence rate of this estimator resolves the main open question of [VV11], which left open the possibility that the error decreased only with the square root of the number of samples. Our distance to uniformity estimator, when given $O(\frac{m}{\eps^2\log m})$ independent samples from any distribution, returns an $\eps$-accurate estimate of the $L_1$ distance to the uniform distribution of support $m$. This is constant-factor optimal, for constant $\epsilon$. Finally, our framework extends naturally to properties of pairs of distributions, including estimating the $L_1$ distance and KL-divergence between pairs of distributions. We give an explicit linear estimator for estimating $L_1$ distance to additive accuracy $\epsilon$ using $O(\frac{n}{\eps^2\log n})$ samples from each distribution, which is constant-factor optimal, for constant $\epsilon$. This is the first sub linear-sample estimator for this fundamental property.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

线性估计量的威力

对于广泛的实际相关分布属性，包括熵和支持大小，几乎所有提出的估计都有一个特别简单的形式。给定一组来自离散分布的独立样本，这些估计器计算汇总统计向量——在样本中出现一次、两次等的域元素的数量——并输出这些汇总统计与固定系数向量之间的点积。我们称这样的估计量为\emph{线性}的。这种对线性估计的历史倾向有点令人困惑，因为尽管近60年来做出了许多努力，但与[VV11]中所示的界限相比，所有提出的这种估计都具有明显的次优收敛性。我们的主要结果，在某种意义上证明了对线性估计量的坚持，就是对于这个广义类中的任何性质，存在一个近最优线性估计量。此外，我们给出了一个实用的多项式时间算法来构造任意给定参数的估计量。虽然这个结果没有对这些估计任务的样本复杂性产生明确的界限，但我们利用这个结果提供的见解，为三个属性给出了近似最优线性估计器的显式结构:熵，$L_1$到均匀性的距离，以及对分布的$L_1$距离。我们的熵估计器，当给出$O(\frac{n}{\eps \log n})$独立样本时，支持分布最多$n,$将估计分布的熵到加性精度$\epsilon$内，失效概率$o(1/poly(n)).$从[VV11]中给出的最近的下界来看，这个估计器是最优的，对于常数因子，无论是对$n$的依赖，还是对$\epsilon.$的依赖，特别是，该估计器的逆线性收敛速率解决了[VV11]的主要开放性问题，即误差仅随样本数量的平方根而减小的可能性。当给定来自任何分布的$O(\frac{m}{\eps^2\log m})$独立样本时，我们到均匀性估计器的距离返回到支持$m$均匀分布的$L_1$距离的$\eps$ -精确估计。这是常数因子最优的，对于常数$\epsilon$。最后，我们的框架自然地扩展到分布对的性质，包括估计分布对之间的$L_1$距离和kl散度。我们给出了一个显式的线性估计器，用于估计$L_1$到加性精度$\epsilon$的距离，使用来自每个分布的$O(\frac{n}{\eps^2\log n})$样本，这是常数因子最优的，对于常数$\epsilon$。这是这个基本性质的第一个子线性样本估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助