{"title":"Efficient algorithm for projected clustering","authors":"Eric Ka Ka Ng, A. Fu","doi":"10.1109/ICDE.2002.994727","DOIUrl":null,"url":null,"abstract":"With high-dimensional data, natural clusters are expected to exist in different subspaces. We propose the EPC (efficient projected clustering) algorithm to discover the sets of correlated dimensions and the location of the clusters. This algorithm is quite different from previous approaches and has the following advantages: (1) there is no requirement on the input regarding the number of natural clusters and the average cardinality of the subspaces; (2) it can handle clusters of irregular shapes; (3) it produces better clustering results compared to the best previous method; (4) it has high scalability. From experiments, it is several times faster than the previous method, while producing more accurate results.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 18th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2002.994727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
With high-dimensional data, natural clusters are expected to exist in different subspaces. We propose the EPC (efficient projected clustering) algorithm to discover the sets of correlated dimensions and the location of the clusters. This algorithm is quite different from previous approaches and has the following advantages: (1) there is no requirement on the input regarding the number of natural clusters and the average cardinality of the subspaces; (2) it can handle clusters of irregular shapes; (3) it produces better clustering results compared to the best previous method; (4) it has high scalability. From experiments, it is several times faster than the previous method, while producing more accurate results.