{"title":"基于滑动窗口的实时集群框架","authors":"Sobhan Badiozamany, Kjell Orsborn, T. Risch","doi":"10.1145/2949689.2949696","DOIUrl":null,"url":null,"abstract":"Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Framework for real-time clustering over sliding windows\",\"authors\":\"Sobhan Badiozamany, Kjell Orsborn, T. Risch\",\"doi\":\"10.1145/2949689.2949696\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.\",\"PeriodicalId\":254803,\"journal\":{\"name\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2949689.2949696\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949689.2949696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Framework for real-time clustering over sliding windows
Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where windows are maintained over different time intervals. It provides algorithm independent and efficient sliding mechanisms for clustering queries where the clustering algorithms are defined in terms of queries over cluster data represented as temporal tables. A particular challenge for real-time detection of a high number of fastly evolving clusters is efficiently supporting smooth re-clustering in real-time, i.e. to minimize the sliding time with increasing window size and decreasing strides. To efficiently support such re-clustering for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM), which maintains several generations of intermediate window instances and does not require decremental cluster maintenance. To improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing. Extensive performance evaluation on both synthetic and real data shows that G2CS scales substantially better than related approaches.