Xiaobo Zhu, Guangjun Wu, Hong Zhang, Shupeng Wang, Bingnan Ma
{"title":"Dynamic Count-Min Sketch for Analytical Queries Over Continuous Data Streams","authors":"Xiaobo Zhu, Guangjun Wu, Hong Zhang, Shupeng Wang, Bingnan Ma","doi":"10.1109/HiPC.2018.00033","DOIUrl":null,"url":null,"abstract":"The methods of approximate query processing have been proposed for analytics over high-speed data streams, which compact continuous streams into a space-constrained sketch and provide reliable estimates for different queries. Count-Min (CM) is the state-of-the-art sketching structure supporting many queries with error-guaranteed estimates under limited space. However, we need to create a counter table beforehand in CM according to the size of data streams, while it is usually unpredictable for dynamic data streams. In this paper, we proposed an approach, called Dynamic Count-Min sketch (DCM), which is appropriate for dynamic data set and can provide accurate estimates for point query and self-join size query. Our approach constitutes incremental CM sketches and allocates space in a pay-as-you-go manner. Our mathematical analysis and substantial experiments both show that our approach is appropriate for data sets with dynamic or skewed inputs and can provide error-guaranteed estimates with less space compared to CM.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2018.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The methods of approximate query processing have been proposed for analytics over high-speed data streams, which compact continuous streams into a space-constrained sketch and provide reliable estimates for different queries. Count-Min (CM) is the state-of-the-art sketching structure supporting many queries with error-guaranteed estimates under limited space. However, we need to create a counter table beforehand in CM according to the size of data streams, while it is usually unpredictable for dynamic data streams. In this paper, we proposed an approach, called Dynamic Count-Min sketch (DCM), which is appropriate for dynamic data set and can provide accurate estimates for point query and self-join size query. Our approach constitutes incremental CM sketches and allocates space in a pay-as-you-go manner. Our mathematical analysis and substantial experiments both show that our approach is appropriate for data sets with dynamic or skewed inputs and can provide error-guaranteed estimates with less space compared to CM.