Narongrid Tangpathompong, U. Suksawatchon, J. Suksawatchon
{"title":"仅使用传入基准的演化数据流动态超椭球微聚类","authors":"Narongrid Tangpathompong, U. Suksawatchon, J. Suksawatchon","doi":"10.1145/3144789.3144818","DOIUrl":null,"url":null,"abstract":"Data stream clustering is becoming the efficient method to cluster an online massive data. The clustering task requires a process capable of partitioning data continuously with incremental learning method. In this paper, we present a new clustering method, called DyHEMstream, which is online and offline algorithm. In online phase, dynamic hyper-ellipsoidal micro-cluster is proposed used to keep summary information about evolving data stream based on new incoming data sample. The shape of proposed micro-cluster can represent the incoming data better than traditional micro-cluster. The algorithm processes each data point in one-pass fashion without storing the entire data set. In offline phase, each cluster is generated by expanding hyper-ellipsoidal micro-clusters to form the final clusters. The DyHEMstream algorithm is evaluated on various synthetic data sets using different quality metrics compared with a famous data stream clustering -- DenStream. Based on purity, Rand index, and Jaccard index, DyHEMstrem is very efficient than DenStream in term of clustering quality in different shapes, sizes, and densities in noisy data.","PeriodicalId":254163,"journal":{"name":"Proceedings of the 2nd International Conference on Intelligent Information Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Dynamic Hyper-ellipsoidal Micro-Clustering for Evolving Data Stream Using Only Incoming Datum\",\"authors\":\"Narongrid Tangpathompong, U. Suksawatchon, J. Suksawatchon\",\"doi\":\"10.1145/3144789.3144818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data stream clustering is becoming the efficient method to cluster an online massive data. The clustering task requires a process capable of partitioning data continuously with incremental learning method. In this paper, we present a new clustering method, called DyHEMstream, which is online and offline algorithm. In online phase, dynamic hyper-ellipsoidal micro-cluster is proposed used to keep summary information about evolving data stream based on new incoming data sample. The shape of proposed micro-cluster can represent the incoming data better than traditional micro-cluster. The algorithm processes each data point in one-pass fashion without storing the entire data set. In offline phase, each cluster is generated by expanding hyper-ellipsoidal micro-clusters to form the final clusters. The DyHEMstream algorithm is evaluated on various synthetic data sets using different quality metrics compared with a famous data stream clustering -- DenStream. Based on purity, Rand index, and Jaccard index, DyHEMstrem is very efficient than DenStream in term of clustering quality in different shapes, sizes, and densities in noisy data.\",\"PeriodicalId\":254163,\"journal\":{\"name\":\"Proceedings of the 2nd International Conference on Intelligent Information Processing\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd International Conference on Intelligent Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3144789.3144818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Intelligent Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3144789.3144818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Dynamic Hyper-ellipsoidal Micro-Clustering for Evolving Data Stream Using Only Incoming Datum
Data stream clustering is becoming the efficient method to cluster an online massive data. The clustering task requires a process capable of partitioning data continuously with incremental learning method. In this paper, we present a new clustering method, called DyHEMstream, which is online and offline algorithm. In online phase, dynamic hyper-ellipsoidal micro-cluster is proposed used to keep summary information about evolving data stream based on new incoming data sample. The shape of proposed micro-cluster can represent the incoming data better than traditional micro-cluster. The algorithm processes each data point in one-pass fashion without storing the entire data set. In offline phase, each cluster is generated by expanding hyper-ellipsoidal micro-clusters to form the final clusters. The DyHEMstream algorithm is evaluated on various synthetic data sets using different quality metrics compared with a famous data stream clustering -- DenStream. Based on purity, Rand index, and Jaccard index, DyHEMstrem is very efficient than DenStream in term of clustering quality in different shapes, sizes, and densities in noisy data.