{"title":"Enhancing CluStream Algorithm for Clustering Big Data Streaming over Sliding Window","authors":"D. Sayed, S. Rady, M. Aref","doi":"10.1109/ICEENG45378.2020.9171705","DOIUrl":null,"url":null,"abstract":"Data stream mining becomes a hot research issue in the ongoing time. The main challenge in data stream mining is the knowledge extraction in real-time from an immense, data stream in only one scan. Data stream clustering demonstrates an significant task in data stream processing. This paper introduces SCluStream an algorithm for determining clusters over a sliding window to manage such challenges. The algorithm is an improvement over CluStream which does not involve this sliding window concept. In the sliding window model, only the most recent data is utilized while the old data is eliminated, which allows for faster execution. A better clustering technique is also involved which managed to contribute to accuracy improvement. The proposed algorithm has been tested on two real datasets; charitable donation data set and forest cover type data set. The results showed that comparing SCluStream to CluStream has proven that the former algorithm is more efficient for clustering big data streams in regard to the accuracy as well as the utilized time and memory usages.","PeriodicalId":346636,"journal":{"name":"2020 12th International Conference on Electrical Engineering (ICEENG)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 12th International Conference on Electrical Engineering (ICEENG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEENG45378.2020.9171705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Data stream mining becomes a hot research issue in the ongoing time. The main challenge in data stream mining is the knowledge extraction in real-time from an immense, data stream in only one scan. Data stream clustering demonstrates an significant task in data stream processing. This paper introduces SCluStream an algorithm for determining clusters over a sliding window to manage such challenges. The algorithm is an improvement over CluStream which does not involve this sliding window concept. In the sliding window model, only the most recent data is utilized while the old data is eliminated, which allows for faster execution. A better clustering technique is also involved which managed to contribute to accuracy improvement. The proposed algorithm has been tested on two real datasets; charitable donation data set and forest cover type data set. The results showed that comparing SCluStream to CluStream has proven that the former algorithm is more efficient for clustering big data streams in regard to the accuracy as well as the utilized time and memory usages.