Snehalika Lall, Rimita Lahiri, A. Konar, Sanchita Ghosh
{"title":"An improved measure for data clustering in high dimensional space","authors":"Snehalika Lall, Rimita Lahiri, A. Konar, Sanchita Ghosh","doi":"10.1109/MICROCOM.2016.7522565","DOIUrl":null,"url":null,"abstract":"The k-means clustering fails to correctly cluster the data points in high dimensional space, primarily for employing Euclidean norm as the distance metric. The Euclidean metric increases with the increase in data dimension, thus posing difficulty to segregate intra-cluster and inter-cluster data points. Adoption of k-means clustering, realized with Euclidean distance norm, often misguides the selection of cluster centres in a given iteration. This paper proposes a novel approach to k-means clustering algorithm by replacing the Euclidean distance metric by a new one. The merit of the proposed metric lies in keeping the distance low, even for large dimensional data points. The new metric enables the algorithm to correctly select the cluster centres over the iterations. Experiments undertaken revealed that the said distance metric based k-means clustering outperforms the traditional one by a large margin.","PeriodicalId":118902,"journal":{"name":"2016 International Conference on Microelectronics, Computing and Communications (MicroCom)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Microelectronics, Computing and Communications (MicroCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICROCOM.2016.7522565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The k-means clustering fails to correctly cluster the data points in high dimensional space, primarily for employing Euclidean norm as the distance metric. The Euclidean metric increases with the increase in data dimension, thus posing difficulty to segregate intra-cluster and inter-cluster data points. Adoption of k-means clustering, realized with Euclidean distance norm, often misguides the selection of cluster centres in a given iteration. This paper proposes a novel approach to k-means clustering algorithm by replacing the Euclidean distance metric by a new one. The merit of the proposed metric lies in keeping the distance low, even for large dimensional data points. The new metric enables the algorithm to correctly select the cluster centres over the iterations. Experiments undertaken revealed that the said distance metric based k-means clustering outperforms the traditional one by a large margin.