一种新的数据流聚类框架

Hadi Tajali Zadeh, Reza Boostani
{"title":"一种新的数据流聚类框架","authors":"Hadi Tajali Zadeh, Reza Boostani","doi":"10.1109/CJECE.2018.2885326","DOIUrl":null,"url":null,"abstract":"There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers⣙ scores. Here, the incremental NaÃˉve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.","PeriodicalId":55287,"journal":{"name":"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2019-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CJECE.2018.2885326","citationCount":"5","resultStr":"{\"title\":\"A Novel Clustering Framework for Stream Data Un nouveau cadre de classifications pour les données de flux\",\"authors\":\"Hadi Tajali Zadeh, Reza Boostani\",\"doi\":\"10.1109/CJECE.2018.2885326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers⣙ scores. Here, the incremental NaÃˉve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.\",\"PeriodicalId\":55287,\"journal\":{\"name\":\"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2019-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/CJECE.2018.2885326\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CJECE.2018.2885326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Electrical and Computer Engineering-Revue Canadienne De Genie Electrique et Informatique","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CJECE.2018.2885326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 5

摘要

开发连续流数据的实时聚类的趋势越来越大。在这方面,已经进行了一些尝试来改进流聚类方法的离线阶段,而这些方法在其在线阶段几乎使用了简单的距离函数。在实践中,团簇具有复杂的形状,因此,测量入射样本到不对称微团簇平均值的距离可能会将入射样本误导到不相关的微团簇。本文提出了一种新的框架,它可以增强所有流聚类方法的在线阶段。以这种方式,对于其种群超过阈值的每个微集群,分类器被专门训练以捕获其边界和统计特性。因此,根据分类器将传入样本分配给微集群™ 得分。这里,由于其快速学习的特性,选择了增量Nave Bayes分类器。选择了DenStream和CluStream作为最先进的方法,并在九个合成和真实数据集上评估了它们的性能,无论是否应用所提出的框架。在数据集的纯度、一般召回率、一般精度、概念变化可追溯性、计算复杂性和抗噪声稳健性方面的比较结果表明,修改后的方法比原始版本更优越。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Novel Clustering Framework for Stream Data Un nouveau cadre de classifications pour les données de flux
There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers⣙ scores. Here, the incremental NaÃˉve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
27
期刊介绍: The Canadian Journal of Electrical and Computer Engineering (ISSN-0840-8688), issued quarterly, has been publishing high-quality refereed scientific papers in all areas of electrical and computer engineering since 1976
期刊最新文献
Design and Construction of an Advanced Tracking Wheel for Insulator Materials Testing Implementation of Ultrahigh-Speed Decimators Noncoherent Distributed Beamforming in Decentralized Two-Way Relay Networks Fetal ECG Extraction Using Input-Mode and Output-Mode Adaptive Filters With Blind Source Separation Design Consideration to Achieve Wide-Speed-Range Operation in a Switched Reluctance Motor
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1