Random samples are lossy summaries which allow queries posed over the data to be approximated by applying an appropriate estimator to the sample. The effectiveness of sampling, however, hinges on estimator selection. The choice of estimators is subjected to global requirements, such as unbiasedness and range restrictions on the estimate value, and ideally, we seek estimators that are both efficient to derive and apply and admissible (not dominated, in terms of variance, by other estimators). Nevertheless, for a given data domain, sampling scheme, and query, there are many admissible estimators. We define monotone sampling, which is implicit in many applications of massive data set analysis, and study the choice of admissible nonnegative and unbiased estimators. Our main contribution is general derivations of admissible estimators with desirable properties. We present a construction of order-optimal estimators, which minimize variance according to {em any} specified priorities over the data domain. Order-optimality allows us to customize the derivation to common patterns that we can learn or observe in the data. When we prioritize lower values (e.g., more similar data sets when estimating difference), we obtain the L* estimator, which is the unique monotone admissible estimator and dominates the classic Horvitz-Thompson estimator. We show that the L* estimator is 4-competitive, meaning that the expectation of the square, on any data, is at most $4$ times the minimum possible for that data. These properties make the L* estimator a natural default choice. We also present the U$^*$ estimator, which prioritizes large values (e.g., less similar data sets). Our estimator constructions are general, natural, and practical, allowing us to make the most from our summarized data.
{"title":"Estimation for monotone sampling: competitiveness and customization","authors":"E. Cohen","doi":"10.1145/2611462.2611485","DOIUrl":"https://doi.org/10.1145/2611462.2611485","url":null,"abstract":"Random samples are lossy summaries which allow queries posed over the data to be approximated by applying an appropriate estimator to the sample. The effectiveness of sampling, however, hinges on estimator selection. The choice of estimators is subjected to global requirements, such as unbiasedness and range restrictions on the estimate value, and ideally, we seek estimators that are both efficient to derive and apply and admissible (not dominated, in terms of variance, by other estimators). Nevertheless, for a given data domain, sampling scheme, and query, there are many admissible estimators. We define monotone sampling, which is implicit in many applications of massive data set analysis, and study the choice of admissible nonnegative and unbiased estimators. Our main contribution is general derivations of admissible estimators with desirable properties. We present a construction of order-optimal estimators, which minimize variance according to {em any} specified priorities over the data domain. Order-optimality allows us to customize the derivation to common patterns that we can learn or observe in the data. When we prioritize lower values (e.g., more similar data sets when estimating difference), we obtain the L* estimator, which is the unique monotone admissible estimator and dominates the classic Horvitz-Thompson estimator. We show that the L* estimator is 4-competitive, meaning that the expectation of the square, on any data, is at most $4$ times the minimum possible for that data. These properties make the L* estimator a natural default choice. We also present the U$^*$ estimator, which prioritizes large values (e.g., less similar data sets). Our estimator constructions are general, natural, and practical, allowing us to make the most from our summarized data.","PeriodicalId":186800,"journal":{"name":"Proceedings of the 2014 ACM symposium on Principles of distributed computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115414057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Elkin, H. Klauck, Danupon Nanongkai, Gopal Pandurangan
The focus of this paper is on quantum distributed computation, where we investigate whether quantum communication can help in speeding up distributed network algorithms. Our main result is that for certain fundamental network problems such as minimum spanning tree, minimum cut, and shortest paths, quantum communication does not help in substantially speeding up distributed algorithms for these problems compared to the classical setting. In order to obtain this result, we extend the technique of Das Sarma et al. [SICOMP 2012] to obtain a uniform approach to prove non-trivial lower bounds for quantum distributed algorithms for several graph optimization (both exact and approximate versions) as well as verification problems, some of which are new even in the classical setting, e.g. tight randomized lower bounds for Hamiltonian cycle and spanning tree verification, answering an open problem of Das Sarma et al., and a lower bound in terms of the weight aspect ratio, matching the upper bounds of Elkin [STOC 2004]. Our approach introduces the Server model and Quantum Simulation Theorem which together provide a connection between distributed algorithms and communication complexity. The Server model is the standard two-party communication complexity model augmented with additional power; yet, most of the hardness in the two-party model is carried over to this new model. The Quantum Simulation Theorem carries this hardness further to quantum distributed computing. Our techniques, except the proof of the hardness in the Server model, require very little knowledge in quantum computing, and this can help overcoming a usual impediment in proving bounds on quantum distributed algorithms. In particular, if one can prove a lower bound for distributed algorithms for a certain problem using the technique of Das Sarma et al., it is likely that such lower bound can be extended to the quantum setting using tools provided in this paper and without the need of knowledge in quantum computing.
本文的重点是量子分布式计算,我们研究量子通信是否有助于加快分布式网络算法。我们的主要结果是,对于某些基本的网络问题,如最小生成树、最小切割和最短路径,与经典设置相比,量子通信并不能大大加快这些问题的分布式算法。为了得到这一结果,我们扩展了Das Sarma et al. [SICOMP 2012]的技术,获得了一种统一的方法来证明几种图优化(精确和近似版本)的量子分布式算法的非平凡下界以及验证问题,其中一些问题甚至在经典设置中也是新的,例如hamilton循环和生成树验证的紧密随机下界,回答了Das Sarma et al.的一个开放问题。以及权重长宽比的下界,与Elkin [STOC 2004]的上界相匹配。我们的方法引入了服务器模型和量子模拟定理,它们共同提供了分布式算法和通信复杂性之间的联系。服务器模型是标准的双方通信复杂性模型,增强了额外的功能;然而,两党模式中的大多数困难都被转移到了这个新模式中。量子模拟定理将这种硬度进一步扩展到量子分布式计算。我们的技术,除了在服务器模型中证明硬度之外,几乎不需要量子计算方面的知识,这可以帮助克服证明量子分布式算法边界的常见障碍。特别是,如果可以使用Das Sarma等人的技术证明某个问题的分布式算法的下界,则很可能使用本文提供的工具,而不需要量子计算的知识,就可以将该下界扩展到量子设置。
{"title":"Can quantum communication speed up distributed computation?","authors":"Michael Elkin, H. Klauck, Danupon Nanongkai, Gopal Pandurangan","doi":"10.1145/2611462.2611488","DOIUrl":"https://doi.org/10.1145/2611462.2611488","url":null,"abstract":"The focus of this paper is on quantum distributed computation, where we investigate whether quantum communication can help in speeding up distributed network algorithms. Our main result is that for certain fundamental network problems such as minimum spanning tree, minimum cut, and shortest paths, quantum communication does not help in substantially speeding up distributed algorithms for these problems compared to the classical setting. In order to obtain this result, we extend the technique of Das Sarma et al. [SICOMP 2012] to obtain a uniform approach to prove non-trivial lower bounds for quantum distributed algorithms for several graph optimization (both exact and approximate versions) as well as verification problems, some of which are new even in the classical setting, e.g. tight randomized lower bounds for Hamiltonian cycle and spanning tree verification, answering an open problem of Das Sarma et al., and a lower bound in terms of the weight aspect ratio, matching the upper bounds of Elkin [STOC 2004]. Our approach introduces the Server model and Quantum Simulation Theorem which together provide a connection between distributed algorithms and communication complexity. The Server model is the standard two-party communication complexity model augmented with additional power; yet, most of the hardness in the two-party model is carried over to this new model. The Quantum Simulation Theorem carries this hardness further to quantum distributed computing. Our techniques, except the proof of the hardness in the Server model, require very little knowledge in quantum computing, and this can help overcoming a usual impediment in proving bounds on quantum distributed algorithms. In particular, if one can prove a lower bound for distributed algorithms for a certain problem using the technique of Das Sarma et al., it is likely that such lower bound can be extended to the quantum setting using tools provided in this paper and without the need of knowledge in quantum computing.","PeriodicalId":186800,"journal":{"name":"Proceedings of the 2014 ACM symposium on Principles of distributed computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114431497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}