Anytime clustering of data streams while handling noise and concept drift

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Experimental & Theoretical Artificial Intelligence Pub Date : 2021-03-15 DOI:10.1080/0952813X.2021.1882001
Jagat Sesh Challa, Poonam Goyal, Ajinkya Kokandakar, D. Mantri, Pranet Verma, S. Balasubramaniam, Navneet Goyal
{"title":"Anytime clustering of data streams while handling noise and concept drift","authors":"Jagat Sesh Challa, Poonam Goyal, Ajinkya Kokandakar, D. Mantri, Pranet Verma, S. Balasubramaniam, Navneet Goyal","doi":"10.1080/0952813X.2021.1882001","DOIUrl":null,"url":null,"abstract":"ABSTRACT Clustering of data streams has become very popular in recent times, owing to rapid rise of real-time streaming utilities that produce large amounts of data at varying inter-arrival rates. We propose AnyClus, a framework for anytime clustering of data streams. AnyClus uses a proposed variant of R-tree, AnyRTree, to capture the incoming stream objects arriving at variable rate, and to index them in the form of micro-clusters of hierarchical fashion. The leaf-level micro-clusters produced are aggregated and stored in a logarithmic tilted-time window framework (TTWF). Our extensive experimental analysis shows (i) the capability of AnyClus in handling variable stream speeds (upto 250k objects/second); (ii) its ability to produce micro-clusters of high purity (≈1) and compactness; (iii) effectiveness of AnyRTree in handling noise, capturing concept drift and preservation of spatial locality in the indexing of micro-clusters, when compared to the existing methods. We also propose a parallel framework, Any-MP-Clus, for anytime clustering of multiport data streams over commodity clusters. Any-MP-Clus uses AnyRTree at each computing node of the cluster (for each stream-port) and maintains the aggregated micro-clusters in TTWF. The experimental results on datasets of billions scale show that Any-MP-Clus is scalable, efficient and produces clustering of higher quality.","PeriodicalId":15677,"journal":{"name":"Journal of Experimental & Theoretical Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2021-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental & Theoretical Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/0952813X.2021.1882001","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT Clustering of data streams has become very popular in recent times, owing to rapid rise of real-time streaming utilities that produce large amounts of data at varying inter-arrival rates. We propose AnyClus, a framework for anytime clustering of data streams. AnyClus uses a proposed variant of R-tree, AnyRTree, to capture the incoming stream objects arriving at variable rate, and to index them in the form of micro-clusters of hierarchical fashion. The leaf-level micro-clusters produced are aggregated and stored in a logarithmic tilted-time window framework (TTWF). Our extensive experimental analysis shows (i) the capability of AnyClus in handling variable stream speeds (upto 250k objects/second); (ii) its ability to produce micro-clusters of high purity (≈1) and compactness; (iii) effectiveness of AnyRTree in handling noise, capturing concept drift and preservation of spatial locality in the indexing of micro-clusters, when compared to the existing methods. We also propose a parallel framework, Any-MP-Clus, for anytime clustering of multiport data streams over commodity clusters. Any-MP-Clus uses AnyRTree at each computing node of the cluster (for each stream-port) and maintains the aggregated micro-clusters in TTWF. The experimental results on datasets of billions scale show that Any-MP-Clus is scalable, efficient and produces clustering of higher quality.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
随时聚类数据流,同时处理噪声和概念漂移
数据流聚类近年来变得非常流行,这是由于实时流实用程序的迅速兴起,这些实用程序以不同的到达速率产生大量数据。我们提出了AnyClus,一个用于数据流随时聚类的框架。AnyClus使用R-tree的提议变体AnyRTree来捕获以可变速率到达的传入流对象,并以分层方式的微集群的形式对它们进行索引。产生的叶片级微簇被聚合并存储在对数倾斜时间窗口框架(TTWF)中。我们广泛的实验分析表明(i) AnyClus处理可变流速度(高达250k对象/秒)的能力;(ii)生产高纯度(≈1)和致密度的微团簇的能力;(iii)与现有方法相比,AnyRTree在处理噪声、捕捉概念漂移和保存微聚类索引的空间局域性方面的有效性。我们还提出了一个并行框架,Any-MP-Clus,用于在商品集群上随时聚类多端口数据流。Any-MP-Clus在集群的每个计算节点(对于每个流端口)使用AnyRTree,并在TTWF中维护聚合的微集群。在数十亿规模数据集上的实验结果表明,Any-MP-Clus具有可扩展性、效率高、聚类质量高的特点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
4.50%
发文量
89
审稿时长
>12 weeks
期刊介绍: Journal of Experimental & Theoretical Artificial Intelligence (JETAI) is a world leading journal dedicated to publishing high quality, rigorously reviewed, original papers in artificial intelligence (AI) research. The journal features work in all subfields of AI research and accepts both theoretical and applied research. Topics covered include, but are not limited to, the following: • cognitive science • games • learning • knowledge representation • memory and neural system modelling • perception • problem-solving
期刊最新文献
Occlusive target recognition method of sorting robot based on anchor-free detection network An effectual underwater image enhancement framework using adaptive trans-resunet ++ with attention mechanism An experimental study of sentiment classification using deep-based models with various word embedding techniques Sign language video to text conversion via optimised LSTM with improved motion estimation An efficient safest route prediction-based route discovery mechanism for drivers using improved golden tortoise beetle optimizer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1