{"title":"大型空间数据库中基于分布的聚类挖掘算法","authors":"Xiaowei Xu, M. Ester, H. Kriegel, J. Sander","doi":"10.1109/ICDE.1998.655795","DOIUrl":null,"url":null,"abstract":"The problem of detecting clusters of points belonging to a spatial point process arises in many applications. In this paper, we introduce the new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) to discover clusters of this type. The results of experiments demonstrate that DBCLASD, contrary to partitioning algorithms such as CLARANS (Clustering Large Applications based on RANdomized Search), discovers clusters of arbitrary shape. Furthermore, DBCLASD does not require any input parameters, in contrast to the clustering algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise) requiring two input parameters, which may be difficult to provide for large databases. In terms of efficiency, DBCLASD is between CLARANS and DBSCAN, close to DBSCAN. Thus, the efficiency of DBCLASD on large spatial databases is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"383","resultStr":"{\"title\":\"A distribution-based clustering algorithm for mining in large spatial databases\",\"authors\":\"Xiaowei Xu, M. Ester, H. Kriegel, J. Sander\",\"doi\":\"10.1109/ICDE.1998.655795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of detecting clusters of points belonging to a spatial point process arises in many applications. In this paper, we introduce the new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) to discover clusters of this type. The results of experiments demonstrate that DBCLASD, contrary to partitioning algorithms such as CLARANS (Clustering Large Applications based on RANdomized Search), discovers clusters of arbitrary shape. Furthermore, DBCLASD does not require any input parameters, in contrast to the clustering algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise) requiring two input parameters, which may be difficult to provide for large databases. In terms of efficiency, DBCLASD is between CLARANS and DBSCAN, close to DBSCAN. Thus, the efficiency of DBCLASD on large spatial databases is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape.\",\"PeriodicalId\":264926,\"journal\":{\"name\":\"Proceedings 14th International Conference on Data Engineering\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"383\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 14th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.1998.655795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 14th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1998.655795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 383
摘要
在许多应用中都出现了检测属于空间点过程的点簇的问题。在本文中,我们引入了新的聚类算法DBCLASD (distributionbasedclustering of LArge Spatial Databases)来发现这类聚类。实验结果表明,与CLARANS(基于随机搜索的大型应用聚类)等划分算法相反,DBCLASD可以发现任意形状的聚类。此外,DBCLASD不需要任何输入参数,与需要两个输入参数的聚类算法DBSCAN (Density-Based Spatial clustering of Applications with Noise)不同,这对于大型数据库来说可能很难提供。在效率方面,DBCLASD介于CLARANS和DBSCAN之间,接近DBSCAN。因此,考虑到DBCLASD在大型空间数据库上的非参数性和对任意形状的簇的良好质量,它的效率非常有吸引力。
A distribution-based clustering algorithm for mining in large spatial databases
The problem of detecting clusters of points belonging to a spatial point process arises in many applications. In this paper, we introduce the new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) to discover clusters of this type. The results of experiments demonstrate that DBCLASD, contrary to partitioning algorithms such as CLARANS (Clustering Large Applications based on RANdomized Search), discovers clusters of arbitrary shape. Furthermore, DBCLASD does not require any input parameters, in contrast to the clustering algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise) requiring two input parameters, which may be difficult to provide for large databases. In terms of efficiency, DBCLASD is between CLARANS and DBSCAN, close to DBSCAN. Thus, the efficiency of DBCLASD on large spatial databases is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape.