使用 MUVR2 在机器学习中调整协变量和评估建模适配性

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Bioinformatics advances Pub Date : 2024-04-04 DOI:10.1093/bioadv/vbae051
Yingxiao Yan, T. Schillemans, Viktor Skantze, Carl Brunius
{"title":"使用 MUVR2 在机器学习中调整协变量和评估建模适配性","authors":"Yingxiao Yan, T. Schillemans, Viktor Skantze, Carl Brunius","doi":"10.1093/bioadv/vbae051","DOIUrl":null,"url":null,"abstract":"Abstract Motivation Machine learning (ML) methods are frequently used in Omics research to examine associations between molecular data and for example exposures and health conditions. ML is also used for feature selection to facilitate biological interpretation. Our previous MUVR algorithm was shown to generate predictions and variable selections at state-of-the-art performance. However, a general framework for assessing modeling fitness is still lacking. In addition, enabling to adjust for covariates is a highly desired, but largely lacking trait in ML. We aimed to address these issues in the new MUVR2 framework. Results The MUVR2 algorithm was developed to include the regularized regression framework elastic net in addition to partial least squares and random forest modeling. Compared with other cross-validation strategies, MUVR2 consistently showed state-of-the-art performance, including variable selection, while minimizing overfitting. Testing on simulated and real-world data, we also showed that MUVR2 allows for the adjustment for covariates using elastic net modeling, but not using partial least squares or random forest. Availability and implementation Algorithms, data, scripts, and a tutorial are open source under GPL-3 license and available in the MUVR2 R package at https://github.com/MetaboComp/MUVR2.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adjusting for covariates and assessing modeling fitness in machine learning using MUVR2\",\"authors\":\"Yingxiao Yan, T. Schillemans, Viktor Skantze, Carl Brunius\",\"doi\":\"10.1093/bioadv/vbae051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Motivation Machine learning (ML) methods are frequently used in Omics research to examine associations between molecular data and for example exposures and health conditions. ML is also used for feature selection to facilitate biological interpretation. Our previous MUVR algorithm was shown to generate predictions and variable selections at state-of-the-art performance. However, a general framework for assessing modeling fitness is still lacking. In addition, enabling to adjust for covariates is a highly desired, but largely lacking trait in ML. We aimed to address these issues in the new MUVR2 framework. Results The MUVR2 algorithm was developed to include the regularized regression framework elastic net in addition to partial least squares and random forest modeling. Compared with other cross-validation strategies, MUVR2 consistently showed state-of-the-art performance, including variable selection, while minimizing overfitting. Testing on simulated and real-world data, we also showed that MUVR2 allows for the adjustment for covariates using elastic net modeling, but not using partial least squares or random forest. Availability and implementation Algorithms, data, scripts, and a tutorial are open source under GPL-3 license and available in the MUVR2 R package at https://github.com/MetaboComp/MUVR2.\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbae051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

摘要 动机 机器学习(ML)方法经常被用于 Omics 研究,以检查分子数据与暴露和健康状况等之间的关联。机器学习还用于特征选择,以促进生物学解释。我们之前的 MUVR 算法已证明能以最先进的性能生成预测和变量选择。然而,目前仍缺乏评估建模适配性的通用框架。此外,能够根据协变量进行调整也是 ML 非常需要但又非常缺乏的特性。我们的目标是在新的 MUVR2 框架中解决这些问题。结果 除了偏最小二乘法和随机森林建模外,MUVR2 算法还包括正则回归框架 elastic net。与其他交叉验证策略相比,MUVR2 始终显示出最先进的性能,包括变量选择,同时最大限度地减少了过拟合。在模拟数据和实际数据的测试中,我们还发现 MUVR2 可以使用弹性网建模调整协变量,而不能使用偏最小二乘法或随机森林建模。可用性与实现 算法、数据、脚本和教程在 GPL-3 许可下开源,可在 https://github.com/MetaboComp/MUVR2 的 MUVR2 R 软件包中获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Adjusting for covariates and assessing modeling fitness in machine learning using MUVR2
Abstract Motivation Machine learning (ML) methods are frequently used in Omics research to examine associations between molecular data and for example exposures and health conditions. ML is also used for feature selection to facilitate biological interpretation. Our previous MUVR algorithm was shown to generate predictions and variable selections at state-of-the-art performance. However, a general framework for assessing modeling fitness is still lacking. In addition, enabling to adjust for covariates is a highly desired, but largely lacking trait in ML. We aimed to address these issues in the new MUVR2 framework. Results The MUVR2 algorithm was developed to include the regularized regression framework elastic net in addition to partial least squares and random forest modeling. Compared with other cross-validation strategies, MUVR2 consistently showed state-of-the-art performance, including variable selection, while minimizing overfitting. Testing on simulated and real-world data, we also showed that MUVR2 allows for the adjustment for covariates using elastic net modeling, but not using partial least squares or random forest. Availability and implementation Algorithms, data, scripts, and a tutorial are open source under GPL-3 license and available in the MUVR2 R package at https://github.com/MetaboComp/MUVR2.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
期刊最新文献
motifbreakR v2: expanded variant analysis including indels and integrated evidence from transcription factor binding databases. TransAnnot-a fast transcriptome annotation pipeline. PatchProt: hydrophobic patch prediction using protein foundation models. Accelerating protein-protein interaction screens with reduced AlphaFold-Multimer sampling. CAPTVRED: an automated pipeline for viral tracking and discovery from capture-based metagenomics samples.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1