{"title":"多目标加权抽样","authors":"E. Cohen","doi":"10.1109/HotWeb.2015.8","DOIUrl":null,"url":null,"abstract":"Key value data sets of the form {(x, wx)} where wx > 0 are prevalent. Common queries over such data are segment f-statistics Q(f, H) = Σx∈H f(wx), specified for a segment H of the keys and a function f. Different choices of f correspond to count, sum, moments, capping, and threshold statistics. When the data set is large, we can compute a smaller sample from which we can quickly estimate statistics. A weighted sample of keys taken with respect to f(wx) provides estimates with statistically guaranteed quality for f-statistics. Such a sample S(f) can be used to estimate g-statistics for g ≠ f, but quality degrades with the disparity between g and f. In this paper we address applications that require quality estimates for a set F of different functions. A naive solution is to compute and work with a different sample S(f) for each f ∈ F. Instead, this can be achieved more effectively and seamlessly using a single multi-objective sample S(F) of a much smaller size. We review multi-objective sampling schemes and place them in our context of estimating f-statistics. We show that a multi-objective sample for F provides quality estimates for any f that is a positive linear combination of functions from F. We then establish a surprising and powerful result when the target set M is all monotone non-decreasing functions, noting that M includes most natural statistics. We provide efficient multi-objective sampling algorithms for M and show that a sample size of k ln n (where n is the number of active keys) provides the same estimation quality, for any f ∈ M, as a dedicated weighted sample of size k for f.","PeriodicalId":252318,"journal":{"name":"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Multi-objective Weighted Sampling\",\"authors\":\"E. Cohen\",\"doi\":\"10.1109/HotWeb.2015.8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Key value data sets of the form {(x, wx)} where wx > 0 are prevalent. Common queries over such data are segment f-statistics Q(f, H) = Σx∈H f(wx), specified for a segment H of the keys and a function f. Different choices of f correspond to count, sum, moments, capping, and threshold statistics. When the data set is large, we can compute a smaller sample from which we can quickly estimate statistics. A weighted sample of keys taken with respect to f(wx) provides estimates with statistically guaranteed quality for f-statistics. Such a sample S(f) can be used to estimate g-statistics for g ≠ f, but quality degrades with the disparity between g and f. In this paper we address applications that require quality estimates for a set F of different functions. A naive solution is to compute and work with a different sample S(f) for each f ∈ F. Instead, this can be achieved more effectively and seamlessly using a single multi-objective sample S(F) of a much smaller size. We review multi-objective sampling schemes and place them in our context of estimating f-statistics. We show that a multi-objective sample for F provides quality estimates for any f that is a positive linear combination of functions from F. We then establish a surprising and powerful result when the target set M is all monotone non-decreasing functions, noting that M includes most natural statistics. We provide efficient multi-objective sampling algorithms for M and show that a sample size of k ln n (where n is the number of active keys) provides the same estimation quality, for any f ∈ M, as a dedicated weighted sample of size k for f.\",\"PeriodicalId\":252318,\"journal\":{\"name\":\"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HotWeb.2015.8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HotWeb.2015.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

格式为{(x, wx)}的键值数据集,其中普遍使用wx > 0。对这些数据的常见查询是段f-statistics Q(f, H) = Σx∈H f(wx),为键的段H和函数f指定。f的不同选择对应于计数、求和、矩、封顶和阈值统计。当数据集很大时,我们可以计算一个较小的样本,从中我们可以快速估计统计量。对f(wx)选取的键的加权样本为f-statistics提供了具有统计保证质量的估计。这样的样本S(f)可以用来估计g≠f的g统计量,但质量随着g和f之间的差异而降低。在本文中,我们解决了需要对不同函数的集合f进行质量估计的应用。一种简单的解决方案是为每个f∈f计算和使用不同的样本S(f)。相反,使用单个小得多的多目标样本S(f)可以更有效和无缝地实现这一目标。我们回顾了多目标抽样方案,并将其置于估计f统计量的背景下。我们证明了F的多目标样本为F的任何函数的正线性组合提供了质量估计。然后,当目标集M是所有单调非递减函数时,我们建立了一个令人惊讶和强大的结果,注意到M包含了大多数自然统计量。我们为M提供了高效的多目标采样算法,并证明对于任何f∈M, k ln n(其中n为活动键的个数)的样本大小与f的k大小的专用加权样本提供相同的估计质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-objective Weighted Sampling
Key value data sets of the form {(x, wx)} where wx > 0 are prevalent. Common queries over such data are segment f-statistics Q(f, H) = Σx∈H f(wx), specified for a segment H of the keys and a function f. Different choices of f correspond to count, sum, moments, capping, and threshold statistics. When the data set is large, we can compute a smaller sample from which we can quickly estimate statistics. A weighted sample of keys taken with respect to f(wx) provides estimates with statistically guaranteed quality for f-statistics. Such a sample S(f) can be used to estimate g-statistics for g ≠ f, but quality degrades with the disparity between g and f. In this paper we address applications that require quality estimates for a set F of different functions. A naive solution is to compute and work with a different sample S(f) for each f ∈ F. Instead, this can be achieved more effectively and seamlessly using a single multi-objective sample S(F) of a much smaller size. We review multi-objective sampling schemes and place them in our context of estimating f-statistics. We show that a multi-objective sample for F provides quality estimates for any f that is a positive linear combination of functions from F. We then establish a surprising and powerful result when the target set M is all monotone non-decreasing functions, noting that M includes most natural statistics. We provide efficient multi-objective sampling algorithms for M and show that a sample size of k ln n (where n is the number of active keys) provides the same estimation quality, for any f ∈ M, as a dedicated weighted sample of size k for f.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance Comparison of Web Servers with Different Architectures: A Case Study Using High Concurrency Workload Re-Examining the Complexity of Popular Websites A Priority-Based Dynamic Web Requests Scheduling for Web Servers over Content-Centric Networking Fog Computing Based Ultraviolet Radiation Measurement via Smartphones Programming Support for an Integrated Multi-Party Computation and MapReduce Infrastructure
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1