泛癌症风险预测模型中的新型机器学习算法:在大型前瞻性队列中的应用

Xifeng Wu, Huakang Tu, Qingfeng Hu, Shan-Pou Tsai, David Ta-Wei Chu, C. Wen
{"title":"泛癌症风险预测模型中的新型机器学习算法:在大型前瞻性队列中的应用","authors":"Xifeng Wu, Huakang Tu, Qingfeng Hu, Shan-Pou Tsai, David Ta-Wei Chu, C. Wen","doi":"10.1136/bmjonc-2023-000087","DOIUrl":null,"url":null,"abstract":"\n\nTo develop and validate machine-learning models that predict the risk of pan-cancer incidence using demographic, questionnaire and routine health check-up data in a large Asian population.\n\n\n\nThis study is a prospective cohort study including 433 549 participants from the prospective MJ cohort including a male cohort (n=208 599) and a female cohort (n=224 950).\n\n\n\nDuring an 8-year median follow-up, 5143 cancers occurred in males and 4764 in females. Compared with Lasso-Cox and Random Survival Forests, XGBoost showed superior performance for both cohorts. The XGBoost model with all 155 features in males and 160 features in females achieved an area under the curve (AUC) of 0.877 and 0.750, respectively. Light models with 31 variables for males and 11 variables for females showed comparable performance: an AUC of 0.876 (95% CI 0.858 to 0.894) in the overall population and 0.818 (95% CI 0.795 to 0.841) in those aged ≥40 years in the male cohort and an AUC of 0.746 (95% CI 0.721 to 0.771) in the overall population and 0.641 (95% CI 0.605 to 0.677) in those aged ≥40 years in the female cohort. High-risk individuals have at least ninefold higher risk of pan-cancer incidence compared with low-risk groups.\n\n\n\nWe developed and internally validated the first machine-learning models based on routine health check-up data to predict pan-cancer risk in the general population and achieved generally good discriminatory ability with a small set of predictors. External validation is warranted before the implementation of our risk model in clinical practice.\n","PeriodicalId":505335,"journal":{"name":"BMJ Oncology","volume":"70 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel machine learning algorithm in risk prediction model for pan-cancer risk: application in a large prospective cohort\",\"authors\":\"Xifeng Wu, Huakang Tu, Qingfeng Hu, Shan-Pou Tsai, David Ta-Wei Chu, C. Wen\",\"doi\":\"10.1136/bmjonc-2023-000087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n\\nTo develop and validate machine-learning models that predict the risk of pan-cancer incidence using demographic, questionnaire and routine health check-up data in a large Asian population.\\n\\n\\n\\nThis study is a prospective cohort study including 433 549 participants from the prospective MJ cohort including a male cohort (n=208 599) and a female cohort (n=224 950).\\n\\n\\n\\nDuring an 8-year median follow-up, 5143 cancers occurred in males and 4764 in females. Compared with Lasso-Cox and Random Survival Forests, XGBoost showed superior performance for both cohorts. The XGBoost model with all 155 features in males and 160 features in females achieved an area under the curve (AUC) of 0.877 and 0.750, respectively. Light models with 31 variables for males and 11 variables for females showed comparable performance: an AUC of 0.876 (95% CI 0.858 to 0.894) in the overall population and 0.818 (95% CI 0.795 to 0.841) in those aged ≥40 years in the male cohort and an AUC of 0.746 (95% CI 0.721 to 0.771) in the overall population and 0.641 (95% CI 0.605 to 0.677) in those aged ≥40 years in the female cohort. High-risk individuals have at least ninefold higher risk of pan-cancer incidence compared with low-risk groups.\\n\\n\\n\\nWe developed and internally validated the first machine-learning models based on routine health check-up data to predict pan-cancer risk in the general population and achieved generally good discriminatory ability with a small set of predictors. External validation is warranted before the implementation of our risk model in clinical practice.\\n\",\"PeriodicalId\":505335,\"journal\":{\"name\":\"BMJ Oncology\",\"volume\":\"70 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjonc-2023-000087\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjonc-2023-000087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究是一项前瞻性队列研究,包括来自前瞻性MJ队列的433 549名参与者,其中包括男性队列(n=208 599)和女性队列(n=224 950)。在8年的中位随访期间,男性和女性分别有5143人和4764人罹患癌症。与 Lasso-Cox 和随机生存森林相比,XGBoost 在两个队列中都表现出更优越的性能。包含所有 155 个特征的 XGBoost 模型(男性)和包含 160 个特征的 XGBoost 模型(女性)的曲线下面积(AUC)分别为 0.877 和 0.750。包含男性 31 个变量和女性 11 个变量的轻模型显示出了相当的性能:在总体人群中,AUC 为 0.876(95% CI 0.858 至 0.894),在年龄≥18 岁的人群中,AUC 为 0.818(95% CI 0.795 至 0.841)。男性队列中年龄≥40 岁者的 AUC 为 0.746(95% CI 0.721 至 0.771),女性队列中年龄≥40 岁者的 AUC 为 0.641(95% CI 0.605 至 0.677)。与低风险人群相比,高风险人群的泛癌症发病风险至少高出九倍。我们开发了首个基于常规健康体检数据的机器学习模型,用于预测普通人群的泛癌症风险,并进行了内部验证,在使用少量预测因子的情况下取得了普遍良好的判别能力。在将我们的风险模型应用于临床实践之前,还需要进行外部验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Novel machine learning algorithm in risk prediction model for pan-cancer risk: application in a large prospective cohort
To develop and validate machine-learning models that predict the risk of pan-cancer incidence using demographic, questionnaire and routine health check-up data in a large Asian population. This study is a prospective cohort study including 433 549 participants from the prospective MJ cohort including a male cohort (n=208 599) and a female cohort (n=224 950). During an 8-year median follow-up, 5143 cancers occurred in males and 4764 in females. Compared with Lasso-Cox and Random Survival Forests, XGBoost showed superior performance for both cohorts. The XGBoost model with all 155 features in males and 160 features in females achieved an area under the curve (AUC) of 0.877 and 0.750, respectively. Light models with 31 variables for males and 11 variables for females showed comparable performance: an AUC of 0.876 (95% CI 0.858 to 0.894) in the overall population and 0.818 (95% CI 0.795 to 0.841) in those aged ≥40 years in the male cohort and an AUC of 0.746 (95% CI 0.721 to 0.771) in the overall population and 0.641 (95% CI 0.605 to 0.677) in those aged ≥40 years in the female cohort. High-risk individuals have at least ninefold higher risk of pan-cancer incidence compared with low-risk groups. We developed and internally validated the first machine-learning models based on routine health check-up data to predict pan-cancer risk in the general population and achieved generally good discriminatory ability with a small set of predictors. External validation is warranted before the implementation of our risk model in clinical practice.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Assessing the rates of false-positive ovarian cancer screenings and surgical interventions associated with screening tools: a systematic review Association between rheumatoid arthritis and risk of radiotherapy toxicity: a systematic review Novel machine learning algorithm in risk prediction model for pan-cancer risk: application in a large prospective cohort Addressing the hidden toxicities of cancer: a call to action for clinicians, researchers and clinical trialists Association between neutrophil-to-eosinophil ratio and efficacy outcomes with avelumab plus axitinib or sunitinib in patients with advanced renal cell carcinoma: post hoc analyses from the JAVELIN Renal 101 trial
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1