Ming-Juan Wu, Ying-Lian Gao, Jin-Xing Liu, Rong Zhu, Juan Wang
{"title":"基于图拉普拉斯和双稀疏约束的主成分分析在多视图数据特征选择和样本聚类中的应用","authors":"Ming-Juan Wu, Ying-Lian Gao, Jin-Xing Liu, Rong Zhu, Juan Wang","doi":"10.1159/000501653","DOIUrl":null,"url":null,"abstract":"Principal component analysis (PCA) is a widely used method for evaluating low-dimensional data. Some variants of PCA have been proposed to improve the interpretation of the principal components (PCs). One of the most common methods is sparse PCA which aims at finding a sparse basis to improve the interpretability over the dense basis of PCA. However, the performances of these improved methods are still far from satisfactory because the data still contain redundant PCs. In this paper, a novel method called PCA based on graph Laplacian and double sparse constraints (GDSPCA) is proposed to improve the interpretation of the PCs and consider the internal geometry of the data. In detail, GDSPCA utilizes L2,1-norm and L1-norm regularization terms simultaneously to enforce the matrix to be sparse by filtering redundant and irrelative PCs, where the L2,1-norm regularization term can produce row sparsity, while the L1-norm regularization term can enforce element sparsity. This way, we can make a better interpretation of the new PCs in low-dimensional subspace. Meanwhile, the method of GDSPCA integrates graph Laplacian into PCA to explore the geometric structure hidden in the data. A simple and effective optimization solution is provided. Extensive experiments on multi-view biological data demonstrate the feasibility and effectiveness of the proposed approach.","PeriodicalId":13226,"journal":{"name":"Human Heredity","volume":"84 1","pages":"47 - 58"},"PeriodicalIF":1.1000,"publicationDate":"2019-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1159/000501653","citationCount":"2","resultStr":"{\"title\":\"Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data\",\"authors\":\"Ming-Juan Wu, Ying-Lian Gao, Jin-Xing Liu, Rong Zhu, Juan Wang\",\"doi\":\"10.1159/000501653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Principal component analysis (PCA) is a widely used method for evaluating low-dimensional data. Some variants of PCA have been proposed to improve the interpretation of the principal components (PCs). One of the most common methods is sparse PCA which aims at finding a sparse basis to improve the interpretability over the dense basis of PCA. However, the performances of these improved methods are still far from satisfactory because the data still contain redundant PCs. In this paper, a novel method called PCA based on graph Laplacian and double sparse constraints (GDSPCA) is proposed to improve the interpretation of the PCs and consider the internal geometry of the data. In detail, GDSPCA utilizes L2,1-norm and L1-norm regularization terms simultaneously to enforce the matrix to be sparse by filtering redundant and irrelative PCs, where the L2,1-norm regularization term can produce row sparsity, while the L1-norm regularization term can enforce element sparsity. This way, we can make a better interpretation of the new PCs in low-dimensional subspace. Meanwhile, the method of GDSPCA integrates graph Laplacian into PCA to explore the geometric structure hidden in the data. A simple and effective optimization solution is provided. Extensive experiments on multi-view biological data demonstrate the feasibility and effectiveness of the proposed approach.\",\"PeriodicalId\":13226,\"journal\":{\"name\":\"Human Heredity\",\"volume\":\"84 1\",\"pages\":\"47 - 58\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2019-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1159/000501653\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Human Heredity\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1159/000501653\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Heredity","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1159/000501653","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data
Principal component analysis (PCA) is a widely used method for evaluating low-dimensional data. Some variants of PCA have been proposed to improve the interpretation of the principal components (PCs). One of the most common methods is sparse PCA which aims at finding a sparse basis to improve the interpretability over the dense basis of PCA. However, the performances of these improved methods are still far from satisfactory because the data still contain redundant PCs. In this paper, a novel method called PCA based on graph Laplacian and double sparse constraints (GDSPCA) is proposed to improve the interpretation of the PCs and consider the internal geometry of the data. In detail, GDSPCA utilizes L2,1-norm and L1-norm regularization terms simultaneously to enforce the matrix to be sparse by filtering redundant and irrelative PCs, where the L2,1-norm regularization term can produce row sparsity, while the L1-norm regularization term can enforce element sparsity. This way, we can make a better interpretation of the new PCs in low-dimensional subspace. Meanwhile, the method of GDSPCA integrates graph Laplacian into PCA to explore the geometric structure hidden in the data. A simple and effective optimization solution is provided. Extensive experiments on multi-view biological data demonstrate the feasibility and effectiveness of the proposed approach.
期刊介绍:
Gathering original research reports and short communications from all over the world, ''Human Heredity'' is devoted to methodological and applied research on the genetics of human populations, association and linkage analysis, genetic mechanisms of disease, and new methods for statistical genetics, for example, analysis of rare variants and results from next generation sequencing. The value of this information to many branches of medicine is shown by the number of citations the journal receives in fields ranging from immunology and hematology to epidemiology and public health planning, and the fact that at least 50% of all ''Human Heredity'' papers are still cited more than 8 years after publication (according to ISI Journal Citation Reports). Special issues on methodological topics (such as ‘Consanguinity and Genomics’ in 2014; ‘Analyzing Rare Variants in Complex Diseases’ in 2012) or reviews of advances in particular fields (‘Genetic Diversity in European Populations: Evolutionary Evidence and Medical Implications’ in 2014; ‘Genes and the Environment in Obesity’ in 2013) are published every year. Renowned experts in the field are invited to contribute to these special issues.