PCA作为OPLS-DA模型可靠性的实用指标。

Bradley Worley, Robert Powers
{"title":"PCA作为OPLS-DA模型可靠性的实用指标。","authors":"Bradley Worley,&nbsp;Robert Powers","doi":"10.2174/2213235X04666160613122429","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation.</p><p><strong>Methods: </strong>A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models.</p><p><strong>Results: </strong>With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected.</p><p><strong>Conclusion: </strong>Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.</p>","PeriodicalId":10806,"journal":{"name":"Current Metabolomics","volume":"4 2","pages":"97-103"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2174/2213235X04666160613122429","citationCount":"232","resultStr":"{\"title\":\"PCA as a practical indicator of OPLS-DA model reliability.\",\"authors\":\"Bradley Worley,&nbsp;Robert Powers\",\"doi\":\"10.2174/2213235X04666160613122429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation.</p><p><strong>Methods: </strong>A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models.</p><p><strong>Results: </strong>With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected.</p><p><strong>Conclusion: </strong>Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.</p>\",\"PeriodicalId\":10806,\"journal\":{\"name\":\"Current Metabolomics\",\"volume\":\"4 2\",\"pages\":\"97-103\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.2174/2213235X04666160613122429\",\"citationCount\":\"232\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Metabolomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/2213235X04666160613122429\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Metabolomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/2213235X04666160613122429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 232

摘要

背景:主成分分析(PCA)和正交投影到潜在结构判别分析(OPLS-DA)是功能强大的统计建模工具,可以根据NMR, MS或其他分析仪器的高维光谱测量提供实验组之间分离的见解。然而,如果未经验证就使用这些工具,可能会导致调查人员得出统计上不可靠的结论。这种危险对于偏最小二乘(PLS)和偏最小二乘(opols)来说尤其真实,它们强烈地迫使实验组之间分离。因此,当PCA无法暴露组分离时,OPLS-DA通常被用作替代方法,但这种做法非常危险。如果没有严格的验证,OPLS-DA很容易产生统计上不可靠的组分离。方法:采用蒙特卡罗方法对PCA组分离和OPLS-DA交叉验证指标在评分空间上具有统计学意义的NMR数据集进行分析。在每个数据矩阵中加入线性递增的高斯噪声,然后构建和验证PCA和OPLS-DA模型。结果:随着噪声的增加,PCA组间得分空间距离迅速减小,OPLS-DA交叉验证统计量同时恶化。还观察到估计负载(添加噪声)与真实(原始)负载之间的相关性降低。虽然OPLS-DA模型的有效性随着噪声的增加而降低,但分数空间的组分离基本不受影响。结论:通过对PCA分组分离和OPLS-DA交叉验证指标的蒙特卡罗分析结果,我们为PCA和OPLS-DA模型的可靠推断提供了实用指南和交叉验证建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PCA as a practical indicator of OPLS-DA model reliability.

Background: Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation.

Methods: A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models.

Results: With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected.

Conclusion: Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Understanding Prostate Cancer Cells Metabolome: A Spectroscopic Approach Genome-scale Metabolic Modelling for Succinic Acid Production in Escherichia coli Volatile Metabolomics with Focus on Fungal and Plant Applications - A Review Metabolite Profiling of Fruit and Seed Extracts of Garcinia Xanthochymus Using RP-HPLC-ESI-Q-TOF-MS and Progenesis QI Model-guided Metabolic Gene Knockout of pflA in Escherichia coli Increases Succinic Acid Production from Glycerol Carbon Source
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1