{"title":"多目标加权抽样","authors":"E. Cohen","doi":"10.1109/HotWeb.2015.8","DOIUrl":null,"url":null,"abstract":"Key value data sets of the form {(x, wx)} where wx > 0 are prevalent. Common queries over such data are segment f-statistics Q(f, H) = Σx∈H f(wx), specified for a segment H of the keys and a function f. Different choices of f correspond to count, sum, moments, capping, and threshold statistics. When the data set is large, we can compute a smaller sample from which we can quickly estimate statistics. A weighted sample of keys taken with respect to f(wx) provides estimates with statistically guaranteed quality for f-statistics. Such a sample S(f) can be used to estimate g-statistics for g ≠ f, but quality degrades with the disparity between g and f. In this paper we address applications that require quality estimates for a set F of different functions. A naive solution is to compute and work with a different sample S(f) for each f ∈ F. Instead, this can be achieved more effectively and seamlessly using a single multi-objective sample S(F) of a much smaller size. We review multi-objective sampling schemes and place them in our context of estimating f-statistics. We show that a multi-objective sample for F provides quality estimates for any f that is a positive linear combination of functions from F. We then establish a surprising and powerful result when the target set M is all monotone non-decreasing functions, noting that M includes most natural statistics. We provide efficient multi-objective sampling algorithms for M and show that a sample size of k ln n (where n is the number of active keys) provides the same estimation quality, for any f ∈ M, as a dedicated weighted sample of size k for f.","PeriodicalId":252318,"journal":{"name":"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Multi-objective Weighted Sampling\",\"authors\":\"E. Cohen\",\"doi\":\"10.1109/HotWeb.2015.8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Key value data sets of the form {(x, wx)} where wx > 0 are prevalent. Common queries over such data are segment f-statistics Q(f, H) = Σx∈H f(wx), specified for a segment H of the keys and a function f. Different choices of f correspond to count, sum, moments, capping, and threshold statistics. When the data set is large, we can compute a smaller sample from which we can quickly estimate statistics. A weighted sample of keys taken with respect to f(wx) provides estimates with statistically guaranteed quality for f-statistics. Such a sample S(f) can be used to estimate g-statistics for g ≠ f, but quality degrades with the disparity between g and f. In this paper we address applications that require quality estimates for a set F of different functions. A naive solution is to compute and work with a different sample S(f) for each f ∈ F. Instead, this can be achieved more effectively and seamlessly using a single multi-objective sample S(F) of a much smaller size. We review multi-objective sampling schemes and place them in our context of estimating f-statistics. We show that a multi-objective sample for F provides quality estimates for any f that is a positive linear combination of functions from F. We then establish a surprising and powerful result when the target set M is all monotone non-decreasing functions, noting that M includes most natural statistics. We provide efficient multi-objective sampling algorithms for M and show that a sample size of k ln n (where n is the number of active keys) provides the same estimation quality, for any f ∈ M, as a dedicated weighted sample of size k for f.\",\"PeriodicalId\":252318,\"journal\":{\"name\":\"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HotWeb.2015.8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HotWeb.2015.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Key value data sets of the form {(x, wx)} where wx > 0 are prevalent. Common queries over such data are segment f-statistics Q(f, H) = Σx∈H f(wx), specified for a segment H of the keys and a function f. Different choices of f correspond to count, sum, moments, capping, and threshold statistics. When the data set is large, we can compute a smaller sample from which we can quickly estimate statistics. A weighted sample of keys taken with respect to f(wx) provides estimates with statistically guaranteed quality for f-statistics. Such a sample S(f) can be used to estimate g-statistics for g ≠ f, but quality degrades with the disparity between g and f. In this paper we address applications that require quality estimates for a set F of different functions. A naive solution is to compute and work with a different sample S(f) for each f ∈ F. Instead, this can be achieved more effectively and seamlessly using a single multi-objective sample S(F) of a much smaller size. We review multi-objective sampling schemes and place them in our context of estimating f-statistics. We show that a multi-objective sample for F provides quality estimates for any f that is a positive linear combination of functions from F. We then establish a surprising and powerful result when the target set M is all monotone non-decreasing functions, noting that M includes most natural statistics. We provide efficient multi-objective sampling algorithms for M and show that a sample size of k ln n (where n is the number of active keys) provides the same estimation quality, for any f ∈ M, as a dedicated weighted sample of size k for f.