Multi-ensemble machine learning framework for omics data integration: A case study using breast cancer samples

Kunal Tembhare, Tina Sharma, Sunitha M. Kasibhatla, Archana Achalere, Rajendra Joshi
{"title":"Multi-ensemble machine learning framework for omics data integration: A case study using breast cancer samples","authors":"Kunal Tembhare,&nbsp;Tina Sharma,&nbsp;Sunitha M. Kasibhatla,&nbsp;Archana Achalere,&nbsp;Rajendra Joshi","doi":"10.1016/j.imu.2024.101507","DOIUrl":null,"url":null,"abstract":"<div><p>Integration of voluminous omics data aids to unravel biological complexities associated with different disease phenotypes. Machine learning (ML) approaches provide insightful techniques for systematic multi-omics data integration. In this study, survival prediction of breast cancer patients was undertaken using omics data of 302 female patients from The Cancer Genome Atlas (TCGA). The data included gene expression, miRNA expression, DNA methylation and copy number variation. Three computational multi-ensemble ML pipelines were tested using Support Vector Machine (SVM), Random Forest (RF) and Partial Least Squares-Discriminant Analysis (PLS-DA) algorithms. To overcome the limitations associated with univariate feature selection criteria, the ML pipelines were built along with latent factors obtained by multivariate dimension reduction method. This facilitated investigation of background genetic networks and identification of potential hub genes. Analysis of the results obtained revealed that SVM with PLS-DA method (integrated with gene expression, DNA methylation, and miRNA expression modalities) was the best-performing model with an Area Under Curve (AUC) of 89% and an accuracy of 83% for survival prediction. This study not only corroborated previously reported breast cancer-specific prognostic biomarkers but also predicted additional potential biomarkers. The work demonstrates the effective use of a multi-ensemble ML model with efficient feature selection methods as a robust protocol for cancer genotype to phenotype correlation.</p></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"47 ","pages":"Article 101507"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352914824000637/pdfft?md5=d0bc5069357cca8ad1607f59098d6c54&pid=1-s2.0-S2352914824000637-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914824000637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Integration of voluminous omics data aids to unravel biological complexities associated with different disease phenotypes. Machine learning (ML) approaches provide insightful techniques for systematic multi-omics data integration. In this study, survival prediction of breast cancer patients was undertaken using omics data of 302 female patients from The Cancer Genome Atlas (TCGA). The data included gene expression, miRNA expression, DNA methylation and copy number variation. Three computational multi-ensemble ML pipelines were tested using Support Vector Machine (SVM), Random Forest (RF) and Partial Least Squares-Discriminant Analysis (PLS-DA) algorithms. To overcome the limitations associated with univariate feature selection criteria, the ML pipelines were built along with latent factors obtained by multivariate dimension reduction method. This facilitated investigation of background genetic networks and identification of potential hub genes. Analysis of the results obtained revealed that SVM with PLS-DA method (integrated with gene expression, DNA methylation, and miRNA expression modalities) was the best-performing model with an Area Under Curve (AUC) of 89% and an accuracy of 83% for survival prediction. This study not only corroborated previously reported breast cancer-specific prognostic biomarkers but also predicted additional potential biomarkers. The work demonstrates the effective use of a multi-ensemble ML model with efficient feature selection methods as a robust protocol for cancer genotype to phenotype correlation.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于 omics 数据整合的多集合机器学习框架:使用乳腺癌样本的案例研究
整合大量的组学数据有助于揭示与不同疾病表型相关的生物复杂性。机器学习(ML)方法为系统的多组学数据整合提供了具有洞察力的技术。在这项研究中,我们利用癌症基因组图谱(TCGA)中 302 名女性患者的组学数据对乳腺癌患者的生存率进行了预测。这些数据包括基因表达、miRNA表达、DNA甲基化和拷贝数变异。使用支持向量机(SVM)、随机森林(RF)和偏最小二乘法判别分析(PLS-DA)算法测试了三种计算多集合 ML 管道。为了克服与单变量特征选择标准相关的局限性,在建立 ML 管道的同时,还采用了多变量降维方法获得的潜在因子。这有助于研究背景遗传网络和识别潜在的中心基因。对所得结果的分析表明,采用 PLS-DA 方法(与基因表达、DNA 甲基化和 miRNA 表达模式相结合)的 SVM 是表现最好的模型,其曲线下面积(AUC)为 89%,生存预测准确率为 83%。这项研究不仅证实了之前报道的乳腺癌特异性预后生物标志物,还预测了其他潜在的生物标志物。这项工作证明了多集合 ML 模型与高效特征选择方法的有效结合,可作为癌症基因型与表型相关性的稳健方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Informatics in Medicine Unlocked
Informatics in Medicine Unlocked Medicine-Health Informatics
CiteScore
9.50
自引率
0.00%
发文量
282
审稿时长
39 days
期刊介绍: Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.
期刊最新文献
Usability and accessibility in mHealth stroke apps: An empirical assessment Spatiotemporal chest wall movement analysis using depth sensor imaging for detecting respiratory asynchrony Regression and classification of Windkessel parameters from non-invasive cardiovascular quantities using a fully connected neural network Patient2Trial: From patient to participant in clinical trials using large language models Structural modification of Naproxen; physicochemical, spectral, medicinal, and pharmacological evaluation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1