Chainarong Amornbunchornvej, T. Limpiti, A. Assawamakin, A. Intarapanich, S. Tongsima
{"title":"Improved iterative pruning principal component analysis with graph-theoretic hierarchical clustering","authors":"Chainarong Amornbunchornvej, T. Limpiti, A. Assawamakin, A. Intarapanich, S. Tongsima","doi":"10.1109/ECTICON.2012.6254120","DOIUrl":null,"url":null,"abstract":"Various unsupervised clustering algorithms have been used to infer population structure in genetic data. The goals are to separate individuals of similar genetic characteristics into clusters and to estimate the number of clusters within each dataset. Among them, a framework called iterative pruning principal component analysis (ipPCA) have been developed. It performs PCA iteratively on subsets of data samples and clusters them using fuzzy c-mean. We believe that the choice of model-based clustering method affects the individual assignments and cluster quality, as well as the estimated number of clusters. Thus, in this paper we introduce a hierarchical tree clustering concept from graph theory, whose performance is independent of cluster shapes, into the ipPCA framework. We also add a PCA-based feature selection technique as a data pre-processing step to reduce data dimension and increase computational efficiency. The resulting algorithm is called HiClust-ipPCA. We illustrate the improved clustering results of the HiClust-ipPCA algorithm using 47-breed bovine and 28-breed sheep datasets.","PeriodicalId":6319,"journal":{"name":"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology","volume":"38 1","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECTICON.2012.6254120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Various unsupervised clustering algorithms have been used to infer population structure in genetic data. The goals are to separate individuals of similar genetic characteristics into clusters and to estimate the number of clusters within each dataset. Among them, a framework called iterative pruning principal component analysis (ipPCA) have been developed. It performs PCA iteratively on subsets of data samples and clusters them using fuzzy c-mean. We believe that the choice of model-based clustering method affects the individual assignments and cluster quality, as well as the estimated number of clusters. Thus, in this paper we introduce a hierarchical tree clustering concept from graph theory, whose performance is independent of cluster shapes, into the ipPCA framework. We also add a PCA-based feature selection technique as a data pre-processing step to reduce data dimension and increase computational efficiency. The resulting algorithm is called HiClust-ipPCA. We illustrate the improved clustering results of the HiClust-ipPCA algorithm using 47-breed bovine and 28-breed sheep datasets.