在数据流中发现具有任意形状和密度的簇

2011 10th International Conference on Machine Learning and Applications and Workshops Pub Date : 2011-12-18 DOI:10.1109/ICMLA.2011.56

A. Magdy, N. A. Yousri, Nagwa M. El-Makky

{"title":"在数据流中发现具有任意形状和密度的簇","authors":"A. Magdy, N. A. Yousri, Nagwa M. El-Makky","doi":"10.1109/ICMLA.2011.56","DOIUrl":null,"url":null,"abstract":"The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Discovering Clusters with Arbitrary Shapes and Densities in Data Streams\",\"authors\":\"A. Magdy, N. A. Yousri, Nagwa M. El-Makky\",\"doi\":\"10.1109/ICMLA.2011.56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.\",\"PeriodicalId\":439926,\"journal\":{\"name\":\"2011 10th International Conference on Machine Learning and Applications and Workshops\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 10th International Conference on Machine Learning and Applications and Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2011.56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th International Conference on Machine Learning and Applications and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2011.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

不同领域和不同形式的流数据的可用性增加了流数据分析的重要性。连续流动数据的巨大规模对数据流分析提出了许多挑战。对流数据结构的探索是导致引入各种聚类算法的主要挑战。然而，目前的聚类算法仍然缺乏有效地发现数据流中任意密度的聚类的能力。本文提出了一种新的基于网格和密度的流数据聚类算法。它解决了当前算法在发现任意密度簇方面的缺点。该算法使用在线组件将输入数据映射到网格单元。然后使用离线组件根据密度信息对网格单元进行聚类。提出了相对密度关联度量和动态范围邻域来区分任意密度的聚类。实验结果表明，该算法在聚类质量和可扩展性方面都有了很大的改进。此外，该算法的输出质量对参数选择误差的敏感性较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Discovering Clusters with Arbitrary Shapes and Densities in Data Streams

The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 10th International Conference on Machine Learning and Applications and Workshops

自引率

0.00%

发文量

期刊最新文献

A Data-Mining Approach to Travel Price Forecasting L1 vs. L2 Regularization in Text Classification when Learning from Labeled Features Nonlinear RANSAC Optimization for Parameter Estimation with Applications to Phagocyte Transmigration Speech Rating System through Space Mapping Kernel Methods for Minimum Entropy Encoding