{"title":"基于噪声的多层次密度空间聚类应用","authors":"Shimei Wang, Yun Liu, Bo Shen","doi":"10.1145/2925995.2926040","DOIUrl":null,"url":null,"abstract":"With the rapid development of information technology, more and more complex data has been produced. It has practical significance to mine valuable information from the complex data. Clustering is an important research in the field of data mining. As a density-based clustering algorithm, DBSCAN is sensitive to the input parameters and difficult to find out all the meaningful clusters for datasets with varied densities. Aiming at this shortcoming, this paper proposed the MDBSCAN algorithm. The algorithm can generate two different density parameters by statistical method, and then the clustering can be more accurate for datasets with varied densities. At first, the algorithm uses adjacency list to store the graph generated by the datasets with one parameter Eps. Adjacency list which has been established in the first step is conducive to generate the varied densities parameters MinPts0 and MinPts1. Then, based on the parameters and adjacency list, the clustering algorithm can be implemented more accurately. Finally, compared with algorithm DBSCAN, the experimental results show that the proposed algorithm has higher accuracy in clustering the datasets with varied densities while they have similar running time.","PeriodicalId":159180,"journal":{"name":"Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"MDBSCAN: Multi-level Density Based Spatial Clustering of Applications with Noise\",\"authors\":\"Shimei Wang, Yun Liu, Bo Shen\",\"doi\":\"10.1145/2925995.2926040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid development of information technology, more and more complex data has been produced. It has practical significance to mine valuable information from the complex data. Clustering is an important research in the field of data mining. As a density-based clustering algorithm, DBSCAN is sensitive to the input parameters and difficult to find out all the meaningful clusters for datasets with varied densities. Aiming at this shortcoming, this paper proposed the MDBSCAN algorithm. The algorithm can generate two different density parameters by statistical method, and then the clustering can be more accurate for datasets with varied densities. At first, the algorithm uses adjacency list to store the graph generated by the datasets with one parameter Eps. Adjacency list which has been established in the first step is conducive to generate the varied densities parameters MinPts0 and MinPts1. Then, based on the parameters and adjacency list, the clustering algorithm can be implemented more accurately. Finally, compared with algorithm DBSCAN, the experimental results show that the proposed algorithm has higher accuracy in clustering the datasets with varied densities while they have similar running time.\",\"PeriodicalId\":159180,\"journal\":{\"name\":\"Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2925995.2926040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925995.2926040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

随着信息技术的飞速发展,产生的数据越来越复杂。从复杂的数据中挖掘有价值的信息具有重要的现实意义。聚类是数据挖掘领域的一项重要研究。作为一种基于密度的聚类算法,DBSCAN对输入参数比较敏感,对于不同密度的数据集很难找到所有有意义的聚类。针对这一缺点,本文提出了MDBSCAN算法。该算法通过统计方法生成两个不同的密度参数,从而对不同密度的数据集进行更准确的聚类。首先,该算法使用邻接表存储由一个参数Eps的数据集生成的图。第一步建立的邻接表有利于生成不同密度参数MinPts0和MinPts1。然后,基于参数和邻接表,可以更准确地实现聚类算法。最后,与DBSCAN算法相比,实验结果表明,在运行时间相近的情况下,该算法对不同密度的数据集具有更高的聚类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MDBSCAN: Multi-level Density Based Spatial Clustering of Applications with Noise
With the rapid development of information technology, more and more complex data has been produced. It has practical significance to mine valuable information from the complex data. Clustering is an important research in the field of data mining. As a density-based clustering algorithm, DBSCAN is sensitive to the input parameters and difficult to find out all the meaningful clusters for datasets with varied densities. Aiming at this shortcoming, this paper proposed the MDBSCAN algorithm. The algorithm can generate two different density parameters by statistical method, and then the clustering can be more accurate for datasets with varied densities. At first, the algorithm uses adjacency list to store the graph generated by the datasets with one parameter Eps. Adjacency list which has been established in the first step is conducive to generate the varied densities parameters MinPts0 and MinPts1. Then, based on the parameters and adjacency list, the clustering algorithm can be implemented more accurately. Finally, compared with algorithm DBSCAN, the experimental results show that the proposed algorithm has higher accuracy in clustering the datasets with varied densities while they have similar running time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Integrating Supply Chain by the Supply Chain Operation Referential Model Agent based Semantic Internet of Things (IoT) in Smart Health care The Perceived Values of Service Industry Innovation Research Subsidiary in Taiwan Positive Knowledge Management: Changing Perceptions towards Knowledge Processes in Organizations Study on dual K-means algorithm in collaborative filtering system based on web log
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1