A new risk assessment model of venous thromboembolism by considering fuzzy population.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS BMC Medical Informatics and Decision Making Pub Date : 2024-12-30 DOI:10.1186/s12911-024-02834-3
Xin Wang, Yu-Qing Yang, Xin-Yu Hong, Si-Hua Liu, Jian-Chu Li, Ting Chen, Ju-Hong Shi
{"title":"A new risk assessment model of venous thromboembolism by considering fuzzy population.","authors":"Xin Wang, Yu-Qing Yang, Xin-Yu Hong, Si-Hua Liu, Jian-Chu Li, Ting Chen, Ju-Hong Shi","doi":"10.1186/s12911-024-02834-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Inpatients with high risk of venous thromboembolism (VTE) usually face serious threats to their health and economic conditions. Many studies using machine learning (ML) models to predict VTE risk overlook the impact of class-imbalance problem due to the low incidence rate of VTE, resulting in inferior and unstable model performance, which hinders their ability to replace the Padua model, a widely used linear weighted model in clinic. Our study aims to develop a new VTE risk assessment model suitable for Chinese medical inpatients.</p><p><strong>Methods: </strong>3284 inpatients in the medical department of Peking Union Medical College Hospital (PUMCH) from January 2014 to June 2016 were collected. The training and test set were divided based on the admission time and inpatients from May 2016 to June 2016 were included as the test dataset. We explained the class imbalance problem from a clinical perspective and defined a new term, \"fuzzy population\", to elaborate and model this phenomenon. By considering the \"fuzzy population\", a new ML VTE risk assessment model was built through population splitting. Sensitivity and specificity of our method was compared with five ML models (support vector machine (SVM), random forest (RF), gradient boosting decision tree (GBDT), logistic regression (LR), and XGBoost) and the Padua model.</p><p><strong>Results: </strong>The 'fuzzy population' phenomenon was explained and verified on the VTE dataset. The proposed model achieved higher specificity (64.94% vs. 63.30%) and the same sensitivity (90.24% vs. 90.24%) on test data than the Padua model. Other five ML models couldn't simultaneously surpass the Padua's sensitivity and specificity. Besides, our model was more robust than five ML models and its standard deviations of sensitivities and specificities were smaller. Adjusting the distribution of negative samples in the training set based on the 'fuzzy population' would exacerbate the instability of performance of five ML models, which limited the application of ML methods in clinic.</p><p><strong>Conclusions: </strong>The proposed model achieved higher sensitivity and specificity than the Padua model, and better robustness than traditional ML models. This study built a population-split-based ML model of VTE by modeling the class-imbalance problem and it can be applied more broadly in risk assessment of other diseases.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"413"},"PeriodicalIF":3.3000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11686901/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02834-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Inpatients with high risk of venous thromboembolism (VTE) usually face serious threats to their health and economic conditions. Many studies using machine learning (ML) models to predict VTE risk overlook the impact of class-imbalance problem due to the low incidence rate of VTE, resulting in inferior and unstable model performance, which hinders their ability to replace the Padua model, a widely used linear weighted model in clinic. Our study aims to develop a new VTE risk assessment model suitable for Chinese medical inpatients.

Methods: 3284 inpatients in the medical department of Peking Union Medical College Hospital (PUMCH) from January 2014 to June 2016 were collected. The training and test set were divided based on the admission time and inpatients from May 2016 to June 2016 were included as the test dataset. We explained the class imbalance problem from a clinical perspective and defined a new term, "fuzzy population", to elaborate and model this phenomenon. By considering the "fuzzy population", a new ML VTE risk assessment model was built through population splitting. Sensitivity and specificity of our method was compared with five ML models (support vector machine (SVM), random forest (RF), gradient boosting decision tree (GBDT), logistic regression (LR), and XGBoost) and the Padua model.

Results: The 'fuzzy population' phenomenon was explained and verified on the VTE dataset. The proposed model achieved higher specificity (64.94% vs. 63.30%) and the same sensitivity (90.24% vs. 90.24%) on test data than the Padua model. Other five ML models couldn't simultaneously surpass the Padua's sensitivity and specificity. Besides, our model was more robust than five ML models and its standard deviations of sensitivities and specificities were smaller. Adjusting the distribution of negative samples in the training set based on the 'fuzzy population' would exacerbate the instability of performance of five ML models, which limited the application of ML methods in clinic.

Conclusions: The proposed model achieved higher sensitivity and specificity than the Padua model, and better robustness than traditional ML models. This study built a population-split-based ML model of VTE by modeling the class-imbalance problem and it can be applied more broadly in risk assessment of other diseases.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
背景:静脉血栓栓塞症(VTE)高风险住院病人通常面临着严重的健康和经济威胁。许多使用机器学习(ML)模型预测 VTE 风险的研究忽视了 VTE 低发病率所带来的等级不平衡问题的影响,导致模型性能较差且不稳定,这阻碍了其取代临床上广泛使用的线性加权模型 Padua 模型的能力。方法:收集北京协和医院内科 2014 年 1 月至 2016 年 6 月住院患者 3284 例。根据入院时间划分训练集和测试集,并将 2016 年 5 月至 2016 年 6 月的住院患者作为测试数据集。我们从临床角度解释了类不平衡问题,并定义了一个新术语 "模糊人群 "来阐述和模拟这一现象。考虑到 "模糊人群",我们通过人群分割建立了一个新的 ML VTE 风险评估模型。将我们的方法与五种 ML 模型(支持向量机(SVM)、随机森林(RF)、梯度提升决策树(GBDT)、逻辑回归(LR)和 XGBoost)以及 Padua 模型进行了灵敏度和特异性比较:在 VTE 数据集上解释并验证了 "模糊人群 "现象。与 Padua 模型相比,提出的模型在测试数据上实现了更高的特异性(64.94% 对 63.30%)和相同的灵敏度(90.24% 对 90.24%)。其他五个 ML 模型无法同时超越 Padua 模型的灵敏度和特异性。此外,我们的模型比五个 ML 模型更稳健,灵敏度和特异性的标准偏差也更小。根据 "模糊群体 "调整训练集中阴性样本的分布会加剧五个 ML 模型性能的不稳定性,从而限制了 ML 方法在临床中的应用:结论:所提出的模型比帕多瓦模型具有更高的灵敏度和特异性,比传统的 ML 模型具有更好的鲁棒性。本研究通过类不平衡问题建模,建立了基于人群分割的 VTE ML 模型,该模型可更广泛地应用于其他疾病的风险评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
期刊最新文献
Predicting Gestational Diabetes Mellitus in the first trimester using machine learning algorithms: a cross-sectional study at a hospital fertility health center in Iran. Correction: Development and validation of a nomogram model for prolonged length of stay in spinal fusion patients: a retrospective analysis. Effect of short message service reminders in improving optimal antenatal care, skilled birth attendance and postnatal care in low-and middle-income countries: a systematic review and meta-analysis. A new risk assessment model of venous thromboembolism by considering fuzzy population. An improved electrocardiogram arrhythmia classification performance with feature optimization.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1