PCAH: A PCA-Based Hierarchical Clustering Method for Visual Words Construction

Ying He, Jian Wang, Xue-xia Zhong, Lin Mei, Zhi-zong Wu
{"title":"PCAH: A PCA-Based Hierarchical Clustering Method for Visual Words Construction","authors":"Ying He, Jian Wang, Xue-xia Zhong, Lin Mei, Zhi-zong Wu","doi":"10.1109/CCGrid.2015.33","DOIUrl":null,"url":null,"abstract":"Most of the existing methods for generating a visual dictionary SIFT based on local characteristics, and adopt the common K-means clustering method to get the visual dictionary. But when the image vector dimension of the local feature is growing higher, the vector distribution of the local characteristics becomes sparse, resulting in the high correlation distance between the image vectors and reducing the comparability and universality of the visual patterns. According to the above problem, based on the local SIFT features, this paper introduced a Principal Component Analysis Hierarchical clustering method (PCAH) for generating the visual dictionary. This method can effectively ease the feature dimension disaster and obtain better stability. In addition, this method can solve the problem because of high dimension and structure complexity in the feature space of the images efficiently, and can get better performance in generating the visual dictionary. The experiment is executed on the pedestrians dataset Test_dataset1(our own dataset), pos, the scene classification dataset Upright vs Inverted, and the behavior classification dataset Stanford40_JPEGImages. And the datasets are divided into two groups based on the number of the SIFT features (one is less than 300 and the other is more than 5000). We adopt the Silhouette index and the computation time as the evaluation index. The experiment results indicate that comparing with the K-means clustering algorithm, the proposed PCA-based Hierarchical clustering method (PCAH) can reach higher quality visual words. At the same time, the computation speed of the PCAH clustering method is faster.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"102 1","pages":"1009-1018"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Most of the existing methods for generating a visual dictionary SIFT based on local characteristics, and adopt the common K-means clustering method to get the visual dictionary. But when the image vector dimension of the local feature is growing higher, the vector distribution of the local characteristics becomes sparse, resulting in the high correlation distance between the image vectors and reducing the comparability and universality of the visual patterns. According to the above problem, based on the local SIFT features, this paper introduced a Principal Component Analysis Hierarchical clustering method (PCAH) for generating the visual dictionary. This method can effectively ease the feature dimension disaster and obtain better stability. In addition, this method can solve the problem because of high dimension and structure complexity in the feature space of the images efficiently, and can get better performance in generating the visual dictionary. The experiment is executed on the pedestrians dataset Test_dataset1(our own dataset), pos, the scene classification dataset Upright vs Inverted, and the behavior classification dataset Stanford40_JPEGImages. And the datasets are divided into two groups based on the number of the SIFT features (one is less than 300 and the other is more than 5000). We adopt the Silhouette index and the computation time as the evaluation index. The experiment results indicate that comparing with the K-means clustering algorithm, the proposed PCA-based Hierarchical clustering method (PCAH) can reach higher quality visual words. At the same time, the computation speed of the PCAH clustering method is faster.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PCAH:一种基于pca的视觉词结构分层聚类方法
现有的视觉词典生成方法大多基于SIFT的局部特征,并采用常见的K-means聚类方法得到视觉词典。但是当局部特征的图像矢量维数越来越高时,局部特征的矢量分布变得稀疏,导致图像矢量之间的相关距离很高,降低了视觉模式的可比性和普适性。针对上述问题,本文提出了基于SIFT局部特征的主成分分析层次聚类方法(PCAH)来生成视觉字典。该方法可以有效缓解特征尺寸灾难,获得较好的稳定性。此外,该方法可以有效地解决图像特征空间高维数和结构复杂的问题,在生成视觉字典方面可以获得更好的性能。实验在行人数据集Test_dataset1(我们自己的数据集)、pos、场景分类数据集直立vs倒置、行为分类数据集Stanford40_JPEGImages上执行。根据SIFT特征的数量将数据集分为两组(一组小于300,另一组大于5000)。我们采用Silhouette指标和计算时间作为评价指标。实验结果表明,与K-means聚类算法相比,本文提出的基于pca的分层聚类方法(PCAH)可以获得更高质量的视觉词。同时,PCAH聚类方法的计算速度更快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Self Protecting Data Sharing Using Generic Policies Partition-Aware Routing to Improve Network Isolation in Infiniband Based Multi-tenant Clusters MIC-Tandem: Parallel X!Tandem Using MIC on Tandem Mass Spectrometry Based Proteomics Data Study of the KVM CPU Performance of Open-Source Cloud Management Platforms Visualizing City Events on Search Engine: Tword the Search Infrustration for Smart City
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1