{"title":"一种新的数据流聚类框架","authors":"Hadi Tajali Zadeh, Reza Boostani","doi":"10.1109/CJECE.2018.2885326","DOIUrl":null,"url":null,"abstract":"There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers⣙ scores. Here, the incremental NaÃˉve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.","PeriodicalId":55287,"journal":{"name":"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2019-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CJECE.2018.2885326","citationCount":"5","resultStr":"{\"title\":\"A Novel Clustering Framework for Stream Data Un nouveau cadre de classifications pour les données de flux\",\"authors\":\"Hadi Tajali Zadeh, Reza Boostani\",\"doi\":\"10.1109/CJECE.2018.2885326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers⣙ scores. Here, the incremental NaÃˉve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.\",\"PeriodicalId\":55287,\"journal\":{\"name\":\"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2019-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/CJECE.2018.2885326\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CJECE.2018.2885326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CJECE.2018.2885326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Engineering","Score":null,"Total":0}
A Novel Clustering Framework for Stream Data Un nouveau cadre de classifications pour les données de flux
There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers⣙ scores. Here, the incremental NaÃˉve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.
期刊介绍:
The Canadian Journal of Electrical and Computer Engineering (ISSN-0840-8688), issued quarterly, has been publishing high-quality refereed scientific papers in all areas of electrical and computer engineering since 1976