Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification

2006 5th International Conference on Machine Learning and Applications (ICMLA'06) Pub Date : 2006-12-14 DOI:10.1109/ICMLA.2006.43

Sampath Deegalla, Henrik Boström

{"title":"Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification","authors":"Sampath Deegalla, Henrik Boström","doi":"10.1109/ICMLA.2006.43","DOIUrl":null,"url":null,"abstract":"The computational cost of using nearest neighbor classification often prevents the method from being applied in practice when dealing with high-dimensional data, such as images and micro arrays. One possible solution to this problem is to reduce the dimensionality of the data, ideally without loosing predictive performance. Two different dimensionality reduction methods, principle component analysis (PCA) and random projection (RP), are investigated for this purpose and compared w.r.t. the performance of the resulting nearest neighbor classifier on five image data sets and five micro array data sets. The experiment results demonstrate that PCA outperforms RP for all data sets used in this study. However, the experiments also show that PCA is more sensitive to the choice of the number of reduced dimensions. After reaching a peak, the accuracy degrades with the number of dimensions for PCA, while the accuracy for RP increases with the number of dimensions. The experiments also show that the use of PCA and RP may even outperform using the non-reduced feature set (in 9 respectively 6 cases out of 10), hence not only resulting in more efficient, but also more effective, nearest neighbor classification","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"107","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2006.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 107

Abstract

The computational cost of using nearest neighbor classification often prevents the method from being applied in practice when dealing with high-dimensional data, such as images and micro arrays. One possible solution to this problem is to reduce the dimensionality of the data, ideally without loosing predictive performance. Two different dimensionality reduction methods, principle component analysis (PCA) and random projection (RP), are investigated for this purpose and compared w.r.t. the performance of the resulting nearest neighbor classifier on five image data sets and five micro array data sets. The experiment results demonstrate that PCA outperforms RP for all data sets used in this study. However, the experiments also show that PCA is more sensitive to the choice of the number of reduced dimensions. After reaching a peak, the accuracy degrades with the number of dimensions for PCA, while the accuracy for RP increases with the number of dimensions. The experiments also show that the use of PCA and RP may even outperform using the non-reduced feature set (in 9 respectively 6 cases out of 10), hence not only resulting in more efficient, but also more effective, nearest neighbor classification

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于主成分分析的高维数据降维与基于随机投影的最近邻分类

在处理高维数据(如图像和微阵列)时，使用最近邻分类的计算成本往往阻碍该方法在实践中的应用。这个问题的一个可能的解决方案是降低数据的维数，理想情况下不会失去预测性能。为此研究了主成分分析(PCA)和随机投影(RP)两种不同的降维方法，并比较了在5个图像数据集和5个微阵列数据集上得到的最近邻分类器的性能。实验结果表明，对于本研究中使用的所有数据集，PCA都优于RP。然而，实验也表明，主成分分析对降维数的选择更为敏感。在达到峰值后，PCA的准确率随着维数的增加而下降，而RP的准确率随着维数的增加而增加。实验还表明，使用PCA和RP甚至可能优于使用非约简特征集(在9个案例中分别为6 / 10)，因此不仅产生更高效，而且更有效的最近邻分类

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2006 5th International Conference on Machine Learning and Applications (ICMLA'06)

自引率

0.00%

发文量