利用机器学习预测癌症患者的静脉血栓栓塞(VTE)风险

Samir Khan Townsley, Debraj Basu, Jayneel Vora, Ted Wun, Chen-Nee Chuah, Prabhu R. V. Shankar
{"title":"利用机器学习预测癌症患者的静脉血栓栓塞(VTE)风险","authors":"Samir Khan Townsley,&nbsp;Debraj Basu,&nbsp;Jayneel Vora,&nbsp;Ted Wun,&nbsp;Chen-Nee Chuah,&nbsp;Prabhu R. V. Shankar","doi":"10.1002/hcs2.55","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The association between cancer and venous thromboembolism (VTE) is well-established with cancer patients accounting for approximately 20% of all VTE incidents. In this paper, we have performed a comparison of machine learning (ML) methods to traditional clinical scoring models for predicting the occurrence of VTE in a cancer patient population, identified important features (clinical biomarkers) for ML model predictions, and examined how different approaches to reducing the number of features used in the model impact model performance.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We have developed an ML pipeline including three separate feature selection processes and applied it to routine patient care data from the electronic health records of 1910 cancer patients at the University of California Davis Medical Center.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Our ML-based prediction model achieved an area under the receiver operating characteristic curve of 0.778 ± 0.006 (mean ± SD) when trained on a set of 15 features. This result is comparable with the model performance when trained on all features in our feature pool [0.779 ± 0.006 (mean ± SD) with 29 features]. Our result surpasses the most validated clinical scoring system for VTE risk assessment in cancer patients by 16.1%. We additionally found cancer stage information to be a useful predictor after all performed feature selection processes despite not being used in existing score-based approaches.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>From these findings, we observe that ML can offer new insights and a significant improvement over the most validated clinical VTE risk scoring systems in cancer patients. The results of this study also allowed us to draw insight into our feature pool and identify the features that could have the most utility in the context of developing an efficient ML classifier. While a model trained on our entire feature pool of 29 features significantly outperformed the traditionally used clinical scoring system, we were able to achieve an equivalent performance using a subset of only 15 features through strategic feature selection methods. These results are encouraging for potential applications of ML to predicting cancer-associated VTE in clinical settings such as in bedside decision support systems where feature availability may be limited.</p>\n </section>\n </div>","PeriodicalId":100601,"journal":{"name":"Health Care Science","volume":"2 4","pages":"205-222"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hcs2.55","citationCount":"0","resultStr":"{\"title\":\"Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning\",\"authors\":\"Samir Khan Townsley,&nbsp;Debraj Basu,&nbsp;Jayneel Vora,&nbsp;Ted Wun,&nbsp;Chen-Nee Chuah,&nbsp;Prabhu R. V. Shankar\",\"doi\":\"10.1002/hcs2.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>The association between cancer and venous thromboembolism (VTE) is well-established with cancer patients accounting for approximately 20% of all VTE incidents. In this paper, we have performed a comparison of machine learning (ML) methods to traditional clinical scoring models for predicting the occurrence of VTE in a cancer patient population, identified important features (clinical biomarkers) for ML model predictions, and examined how different approaches to reducing the number of features used in the model impact model performance.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We have developed an ML pipeline including three separate feature selection processes and applied it to routine patient care data from the electronic health records of 1910 cancer patients at the University of California Davis Medical Center.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Our ML-based prediction model achieved an area under the receiver operating characteristic curve of 0.778 ± 0.006 (mean ± SD) when trained on a set of 15 features. This result is comparable with the model performance when trained on all features in our feature pool [0.779 ± 0.006 (mean ± SD) with 29 features]. Our result surpasses the most validated clinical scoring system for VTE risk assessment in cancer patients by 16.1%. We additionally found cancer stage information to be a useful predictor after all performed feature selection processes despite not being used in existing score-based approaches.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>From these findings, we observe that ML can offer new insights and a significant improvement over the most validated clinical VTE risk scoring systems in cancer patients. The results of this study also allowed us to draw insight into our feature pool and identify the features that could have the most utility in the context of developing an efficient ML classifier. While a model trained on our entire feature pool of 29 features significantly outperformed the traditionally used clinical scoring system, we were able to achieve an equivalent performance using a subset of only 15 features through strategic feature selection methods. These results are encouraging for potential applications of ML to predicting cancer-associated VTE in clinical settings such as in bedside decision support systems where feature availability may be limited.</p>\\n </section>\\n </div>\",\"PeriodicalId\":100601,\"journal\":{\"name\":\"Health Care Science\",\"volume\":\"2 4\",\"pages\":\"205-222\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hcs2.55\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Care Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/hcs2.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Care Science","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hcs2.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景癌症与静脉血栓栓塞症(VTE)之间的关联已得到证实,癌症患者约占所有VTE事件的20%。在本文中,我们将机器学习(ML)方法与传统的临床评分模型进行了比较,以预测癌症患者群体中VTE的发生,确定了ML模型预测的重要特征(临床生物标志物),并研究了减少模型中使用的特征数量的不同方法如何影响模型性能。方法我们开发了一个包括三个独立特征选择过程的ML管道,并将其应用于加州大学戴维斯医学中心1910名癌症患者的电子健康记录中的常规患者护理数据。结果我们基于ML的预测模型在接收机工作特性曲线下的面积为0.778 ± 0.006(平均值 ± SD)。当对我们的特征库中的所有特征进行训练时,该结果与模型性能相当[7.779 ± 0.006(平均值 ± SD)具有29个特征]。我们的结果超过了癌症患者VTE风险评估最有效的临床评分系统16.1%。我们还发现,尽管在现有的基于评分的方法中没有使用,但在所有进行特征选择过程后,癌症分期信息是一个有用的预测因素。结论从这些发现中,我们观察到ML可以为癌症患者提供新的见解,并比最有效的临床VTE风险评分系统有显著改善。这项研究的结果还使我们能够深入了解我们的特征库,并确定在开发高效ML分类器的背景下最有用的特征。虽然在我们29个特征的整个特征库上训练的模型显著优于传统使用的临床评分系统,但我们能够通过战略性特征选择方法,仅使用15个特征的子集来实现同等的性能。这些结果对于ML在临床环境中预测癌症相关VTE的潜在应用是令人鼓舞的,例如在特征可用性可能有限的床边决策支持系统中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning

Background

The association between cancer and venous thromboembolism (VTE) is well-established with cancer patients accounting for approximately 20% of all VTE incidents. In this paper, we have performed a comparison of machine learning (ML) methods to traditional clinical scoring models for predicting the occurrence of VTE in a cancer patient population, identified important features (clinical biomarkers) for ML model predictions, and examined how different approaches to reducing the number of features used in the model impact model performance.

Methods

We have developed an ML pipeline including three separate feature selection processes and applied it to routine patient care data from the electronic health records of 1910 cancer patients at the University of California Davis Medical Center.

Results

Our ML-based prediction model achieved an area under the receiver operating characteristic curve of 0.778 ± 0.006 (mean ± SD) when trained on a set of 15 features. This result is comparable with the model performance when trained on all features in our feature pool [0.779 ± 0.006 (mean ± SD) with 29 features]. Our result surpasses the most validated clinical scoring system for VTE risk assessment in cancer patients by 16.1%. We additionally found cancer stage information to be a useful predictor after all performed feature selection processes despite not being used in existing score-based approaches.

Conclusion

From these findings, we observe that ML can offer new insights and a significant improvement over the most validated clinical VTE risk scoring systems in cancer patients. The results of this study also allowed us to draw insight into our feature pool and identify the features that could have the most utility in the context of developing an efficient ML classifier. While a model trained on our entire feature pool of 29 features significantly outperformed the traditionally used clinical scoring system, we were able to achieve an equivalent performance using a subset of only 15 features through strategic feature selection methods. These results are encouraging for potential applications of ML to predicting cancer-associated VTE in clinical settings such as in bedside decision support systems where feature availability may be limited.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
0
期刊最新文献
Study protocol: A national cross-sectional study on psychology and behavior investigation of Chinese residents in 2023. Caregiving in Asia: Priority areas for research, policy, and practice to support family caregivers. Innovative public strategies in response to COVID-19: A review of practices from China. Sixty years of ethical evolution: The 2024 revision of the Declaration of Helsinki (DoH). A novel ensemble ARIMA-LSTM approach for evaluating COVID-19 cases and future outbreak preparedness.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1