PCA作为OPLS-DA模型可靠性的实用指标。

Current Metabolomics Pub Date : 2016-01-01 DOI:10.2174/2213235X04666160613122429

Bradley Worley, Robert Powers

{"title":"PCA作为OPLS-DA模型可靠性的实用指标。","authors":"Bradley Worley, Robert Powers","doi":"10.2174/2213235X04666160613122429","DOIUrl":null,"url":null,"abstract":"Background: Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation.Methods: A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models.Results: With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected.Conclusion: Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.","PeriodicalId":10806,"journal":{"name":"Current Metabolomics","volume":"4 2","pages":"97-103"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2174/2213235X04666160613122429","citationCount":"232","resultStr":"{\"title\":\"PCA as a practical indicator of OPLS-DA model reliability.\",\"authors\":\"Bradley Worley, Robert Powers\",\"doi\":\"10.2174/2213235X04666160613122429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation.Methods: A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models.Results: With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected.Conclusion: Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.\",\"PeriodicalId\":10806,\"journal\":{\"name\":\"Current Metabolomics\",\"volume\":\"4 2\",\"pages\":\"97-103\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.2174/2213235X04666160613122429\",\"citationCount\":\"232\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Metabolomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/2213235X04666160613122429\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Metabolomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/2213235X04666160613122429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 232

摘要

背景:主成分分析(PCA)和正交投影到潜在结构判别分析(OPLS-DA)是功能强大的统计建模工具，可以根据NMR, MS或其他分析仪器的高维光谱测量提供实验组之间分离的见解。然而，如果未经验证就使用这些工具，可能会导致调查人员得出统计上不可靠的结论。这种危险对于偏最小二乘(PLS)和偏最小二乘(opols)来说尤其真实，它们强烈地迫使实验组之间分离。因此，当PCA无法暴露组分离时，OPLS-DA通常被用作替代方法，但这种做法非常危险。如果没有严格的验证，OPLS-DA很容易产生统计上不可靠的组分离。方法:采用蒙特卡罗方法对PCA组分离和OPLS-DA交叉验证指标在评分空间上具有统计学意义的NMR数据集进行分析。在每个数据矩阵中加入线性递增的高斯噪声，然后构建和验证PCA和OPLS-DA模型。结果:随着噪声的增加，PCA组间得分空间距离迅速减小，OPLS-DA交叉验证统计量同时恶化。还观察到估计负载(添加噪声)与真实(原始)负载之间的相关性降低。虽然OPLS-DA模型的有效性随着噪声的增加而降低，但分数空间的组分离基本不受影响。结论:通过对PCA分组分离和OPLS-DA交叉验证指标的蒙特卡罗分析结果，我们为PCA和OPLS-DA模型的可靠推断提供了实用指南和交叉验证建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PCA as a practical indicator of OPLS-DA model reliability.

Background: Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation.

Methods: A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models.

Results: With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected.

Conclusion: Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Current Metabolomics

自引率

0.00%

发文量