Machine learning-based proteogenomic data modeling identifies circulating plasma biomarkers for early detection of lung cancer

Marcela A Johnson, Liping Hou, Bevan Emma Huang, Assieh Saadatpour, Abolfazl Doostparast Torshizi
{"title":"Machine learning-based proteogenomic data modeling identifies circulating plasma biomarkers for early detection of lung cancer","authors":"Marcela A Johnson, Liping Hou, Bevan Emma Huang, Assieh Saadatpour, Abolfazl Doostparast Torshizi","doi":"10.1101/2024.07.30.24311241","DOIUrl":null,"url":null,"abstract":"Identifying genetic variants associated with lung cancer (LC) risk and their impact on plasma protein levels is crucial for understanding LC predisposition. The discovery of risk biomarkers can enhance early LC screening protocols and improve prognostic interventions. In this study, we performed a genome-wide association analysis using the UK Biobank and FinnGen. We identified genetic variants associated with LC and protein levels leveraging the UK Biobank Pharma Proteomics Project. The dysregulated proteins were then analyzed in pre-symptomatic LC cases compared to healthy controls followed by training machine learning models to predict future LC diagnosis. We achieved median AUCs ranging from 0.79 to 0.88 (0-4 years before diagnosis/YBD), 0.73 to 0.83 (5-9YBD), and 0.78 to 0.84 (0-9YBD) based on 5-fold cross-validation. Conducting survival analysis using the 5-9YBD cohort, we identified eight proteins, including CALCB, PLAUR/uPAR, and CD74 whose higher levels were associated with worse overall survival. We also identified potential plasma biomarkers, including previously reported candidates such as CEACAM5, CXCL17, GDF15, and WFDC2, which have shown associations with future LC diagnosis. These proteins are enriched in various pathways, including cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. In conclusion, this study generates novel insights into our understanding of the genome-proteome dynamics in LC. Furthermore, our findings present a promising panel of non-invasive plasma biomarkers that hold potential to support early LC screening initiatives and enhance future diagnostic interventions.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Genetic and Genomic Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.30.24311241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Identifying genetic variants associated with lung cancer (LC) risk and their impact on plasma protein levels is crucial for understanding LC predisposition. The discovery of risk biomarkers can enhance early LC screening protocols and improve prognostic interventions. In this study, we performed a genome-wide association analysis using the UK Biobank and FinnGen. We identified genetic variants associated with LC and protein levels leveraging the UK Biobank Pharma Proteomics Project. The dysregulated proteins were then analyzed in pre-symptomatic LC cases compared to healthy controls followed by training machine learning models to predict future LC diagnosis. We achieved median AUCs ranging from 0.79 to 0.88 (0-4 years before diagnosis/YBD), 0.73 to 0.83 (5-9YBD), and 0.78 to 0.84 (0-9YBD) based on 5-fold cross-validation. Conducting survival analysis using the 5-9YBD cohort, we identified eight proteins, including CALCB, PLAUR/uPAR, and CD74 whose higher levels were associated with worse overall survival. We also identified potential plasma biomarkers, including previously reported candidates such as CEACAM5, CXCL17, GDF15, and WFDC2, which have shown associations with future LC diagnosis. These proteins are enriched in various pathways, including cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. In conclusion, this study generates novel insights into our understanding of the genome-proteome dynamics in LC. Furthermore, our findings present a promising panel of non-invasive plasma biomarkers that hold potential to support early LC screening initiatives and enhance future diagnostic interventions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的蛋白质基因组数据建模确定了用于早期检测肺癌的循环血浆生物标记物
确定与肺癌(LC)风险相关的基因变异及其对血浆蛋白水平的影响对于了解肺癌易感性至关重要。发现风险生物标志物可以加强早期肺癌筛查方案并改善预后干预措施。在这项研究中,我们利用英国生物库和 FinnGen 进行了全基因组关联分析。 我们利用英国生物库医药蛋白质组学项目确定了与 LC 和蛋白质水平相关的基因变异。然后,与健康对照组相比,分析了症状前 LC 病例中的失调蛋白,并训练了机器学习模型来预测未来的 LC 诊断。基于5倍交叉验证,我们获得了0.79至0.88(诊断前0-4年/YBD)、0.73至0.83(诊断前5-9年/YBD)和0.78至0.84(诊断前0-9年/YBD)的中位数AUC。在对 5-9YBD 组群进行生存分析时,我们发现包括 CALCB、PLAUR/uPAR 和 CD74 在内的 8 种蛋白质的水平越高,总生存期越短。我们还发现了潜在的血浆生物标志物,包括之前报道过的候选标志物,如 CEACAM5、CXCL17、GDF15 和 WFDC2,这些标志物与未来的 LC 诊断有关联。这些蛋白富集在各种通路中,包括细胞因子信号转导、白细胞介素调节、中性粒细胞脱颗粒和肺纤维化。总之,这项研究为我们了解 LC 的基因组-蛋白质组动态提供了新的视角。此外,我们的研究结果还提出了一组前景广阔的非侵入性血浆生物标志物,这些标志物有望支持早期肺癌筛查计划并增强未来的诊断干预措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Identifying individuals at risk for surgical supravalvar aortic stenosis by polygenic risk score with graded phenotyping Exome wide association study for blood lipids in 1,158,017 individuals from diverse populations Allelic effects on KLHL17 expression likely mediated by JunB/D underlie a PDAC GWAS signal at chr1p36.33 Genetic associations between SGLT2 inhibition, DPP4 inhibition or GLP1R agonism and prostate cancer risk: a two-sample Mendelian randomisation study A Genome-wide Association Study Identifies Novel Genetic Variants Associated with Knee Pain in the UK Biobank (N = 441,757)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1