V. Braverman, Stephen R. Chestnut, Nikita Ivkin, David P. Woodruff
{"title":"插入流中重击者的敲打计数草图","authors":"V. Braverman, Stephen R. Chestnut, Nikita Ivkin, David P. Woodruff","doi":"10.1145/2897518.2897558","DOIUrl":null,"url":null,"abstract":"Given a stream p1, …, pm of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ2-heavy hitters, i.e., those items j for which fj ≥ є √F2, where fj is the number of occurrences of item j in the stream, and F2 = ∑i ∈ [n] fi2. Such a guarantee is considerably stronger than the ℓ1-guarantee, which finds those j for which fj ≥ є m. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which finds all such j using Θ(log2 n) bits of space (for constant є > 0). The only known lower bound is Ω(logn) bits of space, which comes from the need to specify the identities of the items found. In this paper we show one can achieve O(logn loglogn) bits of space for this problem. Our techniques, based on Gaussian processes, lead to a number of other new results for data streams, including: (1) The first algorithm for estimating F2 simultaneously at all points in a stream using only O(lognloglogn) bits of space, improving a natural union bound. (2) A way to estimate the ℓ∞ norm of a stream up to additive error є √F2 with O(lognloglogn) bits of space, resolving Open Question 3 from the IITK 2006 list for insertion only streams.","PeriodicalId":442965,"journal":{"name":"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Beating CountSketch for heavy hitters in insertion streams\",\"authors\":\"V. Braverman, Stephen R. Chestnut, Nikita Ivkin, David P. Woodruff\",\"doi\":\"10.1145/2897518.2897558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a stream p1, …, pm of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ2-heavy hitters, i.e., those items j for which fj ≥ є √F2, where fj is the number of occurrences of item j in the stream, and F2 = ∑i ∈ [n] fi2. Such a guarantee is considerably stronger than the ℓ1-guarantee, which finds those j for which fj ≥ є m. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which finds all such j using Θ(log2 n) bits of space (for constant є > 0). The only known lower bound is Ω(logn) bits of space, which comes from the need to specify the identities of the items found. In this paper we show one can achieve O(logn loglogn) bits of space for this problem. Our techniques, based on Gaussian processes, lead to a number of other new results for data streams, including: (1) The first algorithm for estimating F2 simultaneously at all points in a stream using only O(lognloglogn) bits of space, improving a natural union bound. (2) A way to estimate the ℓ∞ norm of a stream up to additive error є √F2 with O(lognloglogn) bits of space, resolving Open Question 3 from the IITK 2006 list for insertion only streams.\",\"PeriodicalId\":442965,\"journal\":{\"name\":\"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2897518.2897558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2897518.2897558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Beating CountSketch for heavy hitters in insertion streams
Given a stream p1, …, pm of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ2-heavy hitters, i.e., those items j for which fj ≥ є √F2, where fj is the number of occurrences of item j in the stream, and F2 = ∑i ∈ [n] fi2. Such a guarantee is considerably stronger than the ℓ1-guarantee, which finds those j for which fj ≥ є m. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which finds all such j using Θ(log2 n) bits of space (for constant є > 0). The only known lower bound is Ω(logn) bits of space, which comes from the need to specify the identities of the items found. In this paper we show one can achieve O(logn loglogn) bits of space for this problem. Our techniques, based on Gaussian processes, lead to a number of other new results for data streams, including: (1) The first algorithm for estimating F2 simultaneously at all points in a stream using only O(lognloglogn) bits of space, improving a natural union bound. (2) A way to estimate the ℓ∞ norm of a stream up to additive error є √F2 with O(lognloglogn) bits of space, resolving Open Question 3 from the IITK 2006 list for insertion only streams.