{"title":"Predictive Subspace Clustering","authors":"B. McWilliams, G. Montana","doi":"10.1109/ICMLA.2011.117","DOIUrl":null,"url":null,"abstract":"The problem of detecting clusters in high-dimensional data is increasingly common in machine learning applications, for instance in computer vision and bioinformatics. Recently, a number of approaches in the field of subspace clustering have been proposed which search for clusters in subspaces of unknown dimensions. Learning the number of clusters, the dimension of each subspace, and the correct assignments is a challenging task, and many existing algorithms often perform poorly in the presence of subspaces that have different dimensions and possibly overlap, or are otherwise computationally expensive. In this work we present a novel approach to subspace clustering that learns the numbers of clusters and the dimensionality of each subspace in an efficient way. We assume that the data points in each cluster are well represented in low-dimensions by a PCA model. We propose a measure of predictive influence of data points modelled by PCA which we minimise to drive the clustering process. The proposed predictive subspace clustering algorithm is assessed on both simulated data and on the popular Yale faces database where state-of-the-art performance and speed are obtained.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th International Conference on Machine Learning and Applications and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2011.117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The problem of detecting clusters in high-dimensional data is increasingly common in machine learning applications, for instance in computer vision and bioinformatics. Recently, a number of approaches in the field of subspace clustering have been proposed which search for clusters in subspaces of unknown dimensions. Learning the number of clusters, the dimension of each subspace, and the correct assignments is a challenging task, and many existing algorithms often perform poorly in the presence of subspaces that have different dimensions and possibly overlap, or are otherwise computationally expensive. In this work we present a novel approach to subspace clustering that learns the numbers of clusters and the dimensionality of each subspace in an efficient way. We assume that the data points in each cluster are well represented in low-dimensions by a PCA model. We propose a measure of predictive influence of data points modelled by PCA which we minimise to drive the clustering process. The proposed predictive subspace clustering algorithm is assessed on both simulated data and on the popular Yale faces database where state-of-the-art performance and speed are obtained.