Ying He, Jian Wang, Xue-xia Zhong, Lin Mei, Zhi-zong Wu
{"title":"PCAH: A PCA-Based Hierarchical Clustering Method for Visual Words Construction","authors":"Ying He, Jian Wang, Xue-xia Zhong, Lin Mei, Zhi-zong Wu","doi":"10.1109/CCGrid.2015.33","DOIUrl":null,"url":null,"abstract":"Most of the existing methods for generating a visual dictionary SIFT based on local characteristics, and adopt the common K-means clustering method to get the visual dictionary. But when the image vector dimension of the local feature is growing higher, the vector distribution of the local characteristics becomes sparse, resulting in the high correlation distance between the image vectors and reducing the comparability and universality of the visual patterns. According to the above problem, based on the local SIFT features, this paper introduced a Principal Component Analysis Hierarchical clustering method (PCAH) for generating the visual dictionary. This method can effectively ease the feature dimension disaster and obtain better stability. In addition, this method can solve the problem because of high dimension and structure complexity in the feature space of the images efficiently, and can get better performance in generating the visual dictionary. The experiment is executed on the pedestrians dataset Test_dataset1(our own dataset), pos, the scene classification dataset Upright vs Inverted, and the behavior classification dataset Stanford40_JPEGImages. And the datasets are divided into two groups based on the number of the SIFT features (one is less than 300 and the other is more than 5000). We adopt the Silhouette index and the computation time as the evaluation index. The experiment results indicate that comparing with the K-means clustering algorithm, the proposed PCA-based Hierarchical clustering method (PCAH) can reach higher quality visual words. At the same time, the computation speed of the PCAH clustering method is faster.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"102 1","pages":"1009-1018"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Most of the existing methods for generating a visual dictionary SIFT based on local characteristics, and adopt the common K-means clustering method to get the visual dictionary. But when the image vector dimension of the local feature is growing higher, the vector distribution of the local characteristics becomes sparse, resulting in the high correlation distance between the image vectors and reducing the comparability and universality of the visual patterns. According to the above problem, based on the local SIFT features, this paper introduced a Principal Component Analysis Hierarchical clustering method (PCAH) for generating the visual dictionary. This method can effectively ease the feature dimension disaster and obtain better stability. In addition, this method can solve the problem because of high dimension and structure complexity in the feature space of the images efficiently, and can get better performance in generating the visual dictionary. The experiment is executed on the pedestrians dataset Test_dataset1(our own dataset), pos, the scene classification dataset Upright vs Inverted, and the behavior classification dataset Stanford40_JPEGImages. And the datasets are divided into two groups based on the number of the SIFT features (one is less than 300 and the other is more than 5000). We adopt the Silhouette index and the computation time as the evaluation index. The experiment results indicate that comparing with the K-means clustering algorithm, the proposed PCA-based Hierarchical clustering method (PCAH) can reach higher quality visual words. At the same time, the computation speed of the PCAH clustering method is faster.