{"title":"子空间随时流聚类","authors":"Marwan Hassani, P. Kranen, Rajveer Saini, T. Seidl","doi":"10.1145/2618243.2618286","DOIUrl":null,"url":null,"abstract":"Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space stream clustering. To handle the varying speeds of the data stream, \"anytime\" algorithms are proposed but so far only in full space stream clustering. However, data streams from many application domains contain abundance of dimensions; the clusters often exist only in specific subspaces (subset of dimensions) and do not show up in the full feature space. In this paper, the first algorithm that considers both the high dimensionality and the varying speeds of streaming data, is proposed. The algorithm, called SubClusTree, can flexibly adapt to the different stream speeds and makes the best use of available time to provide a high quality subspace clustering. The experimental results prove the effectiveness of our anytime subspace concept.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"50 1","pages":"37:1-37:4"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Subspace anytime stream clustering\",\"authors\":\"Marwan Hassani, P. Kranen, Rajveer Saini, T. Seidl\",\"doi\":\"10.1145/2618243.2618286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space stream clustering. To handle the varying speeds of the data stream, \\\"anytime\\\" algorithms are proposed but so far only in full space stream clustering. However, data streams from many application domains contain abundance of dimensions; the clusters often exist only in specific subspaces (subset of dimensions) and do not show up in the full feature space. In this paper, the first algorithm that considers both the high dimensionality and the varying speeds of streaming data, is proposed. The algorithm, called SubClusTree, can flexibly adapt to the different stream speeds and makes the best use of available time to provide a high quality subspace clustering. The experimental results prove the effectiveness of our anytime subspace concept.\",\"PeriodicalId\":74773,\"journal\":{\"name\":\"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management\",\"volume\":\"50 1\",\"pages\":\"37:1-37:4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2618243.2618286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2618243.2618286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

高维流数据的聚类是一个新兴的研究领域。现实生活中的数据流给集群任务带来了许多挑战,因为不断有无穷无尽的数据到达。在全空间流聚类方面已经做了大量的研究。为了处理数据流的变化速度,提出了“任意时间”算法,但到目前为止只适用于全空间流聚类。然而,来自许多应用领域的数据流包含丰富的维度;集群通常只存在于特定的子空间(维度的子集)中,而不会出现在完整的特征空间中。本文提出了第一种既考虑高维又考虑流数据速度变化的算法。该算法称为SubClusTree,可以灵活地适应不同的流速度,并充分利用可用时间提供高质量的子空间聚类。实验结果证明了任意时间子空间概念的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Subspace anytime stream clustering
Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space stream clustering. To handle the varying speeds of the data stream, "anytime" algorithms are proposed but so far only in full space stream clustering. However, data streams from many application domains contain abundance of dimensions; the clusters often exist only in specific subspaces (subset of dimensions) and do not show up in the full feature space. In this paper, the first algorithm that considers both the high dimensionality and the varying speeds of streaming data, is proposed. The algorithm, called SubClusTree, can flexibly adapt to the different stream speeds and makes the best use of available time to provide a high quality subspace clustering. The experimental results prove the effectiveness of our anytime subspace concept.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Co-Evolution of Data-Centric Ecosystems. Data perturbation for outlier detection ensembles SLACID - sparse linear algebra in a column-oriented in-memory database system SensorBench: benchmarking approaches to processing wireless sensor network data Efficient data management and statistics with zero-copy integration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1