开发基于机器学习的先天性心脏病预测模型:匹配病例对照研究

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-12-02 DOI:10.1016/j.ijmedinf.2024.105741
Shutong Zhang , Chenxi Kang , Jing Cui , Haodan Xue , Shanshan Zhao , Yukui Chen , Haixia Lu , Lu Ye , Duolao Wang , Fangyao Chen , Yaling Zhao , Leilei Pei , Pengfei Qu
{"title":"开发基于机器学习的先天性心脏病预测模型:匹配病例对照研究","authors":"Shutong Zhang ,&nbsp;Chenxi Kang ,&nbsp;Jing Cui ,&nbsp;Haodan Xue ,&nbsp;Shanshan Zhao ,&nbsp;Yukui Chen ,&nbsp;Haixia Lu ,&nbsp;Lu Ye ,&nbsp;Duolao Wang ,&nbsp;Fangyao Chen ,&nbsp;Yaling Zhao ,&nbsp;Leilei Pei ,&nbsp;Pengfei Qu","doi":"10.1016/j.ijmedinf.2024.105741","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The current congenital heart disease (CHD) prediction tools lack adequate interpretability and convenience, hindering the development of personalized CHD management strategies. We developed a machine learning-based risk stratification model for CHD prediction.</div></div><div><h3>Methods</h3><div>This study utilized data from 1,759 participants in a case-control study of CHD conducted across six birth defects surveillance hospitals located in Xi’an, Shaanxi Province, Northwest China, spanning from January 2014 to December 2016. The data was partitioned into training and testing datasets with a ratio of 7:3. Predictors were selected from a total of 47 input variables through the Least Absolute Shrinkage and Selection Operator (LASSO). Five machine learning algorithms were used to build the CHD risk prediction models. Model performance was assessed based on a range of learning metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, and Brier score. Permutation feature importance was employed to elucidate the prediction model. The best-performing model was used to conduct the risk scores.</div></div><div><h3>Results</h3><div>The eXtreme Gradient Boosting (XGB) model demonstrated superior performance among CHD prediction models, achieving an AUROC of 0.772 (95 % CI 0.728, 0.817) in the testing dataset and 0.738 (0.699, 0.775) in the external validation dataset. The pivotal predictors (top 3) identified by the model included living in rural areas, the low wealth index, and folic acid supplements (&lt;90 days). The resultant risk score exhibited robust calibration capabilities. Utilizing the risk scores, participants were stratified into low, moderate, and high-risk categories, signifying substantial variations in CHD risk.</div></div><div><h3>Conclusion</h3><div>This study underscores the feasibility and efficacy of employing a machine learning-based approach for CHD prediction. The risk scores exhibited potential in identifying pregnant women at high risk for fetal CHD, offering valuable insights for guiding primary prevention and CHD management.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105741"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of machine learning-based models to predict congenital heart disease: A matched case-control study\",\"authors\":\"Shutong Zhang ,&nbsp;Chenxi Kang ,&nbsp;Jing Cui ,&nbsp;Haodan Xue ,&nbsp;Shanshan Zhao ,&nbsp;Yukui Chen ,&nbsp;Haixia Lu ,&nbsp;Lu Ye ,&nbsp;Duolao Wang ,&nbsp;Fangyao Chen ,&nbsp;Yaling Zhao ,&nbsp;Leilei Pei ,&nbsp;Pengfei Qu\",\"doi\":\"10.1016/j.ijmedinf.2024.105741\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>The current congenital heart disease (CHD) prediction tools lack adequate interpretability and convenience, hindering the development of personalized CHD management strategies. We developed a machine learning-based risk stratification model for CHD prediction.</div></div><div><h3>Methods</h3><div>This study utilized data from 1,759 participants in a case-control study of CHD conducted across six birth defects surveillance hospitals located in Xi’an, Shaanxi Province, Northwest China, spanning from January 2014 to December 2016. The data was partitioned into training and testing datasets with a ratio of 7:3. Predictors were selected from a total of 47 input variables through the Least Absolute Shrinkage and Selection Operator (LASSO). Five machine learning algorithms were used to build the CHD risk prediction models. Model performance was assessed based on a range of learning metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, and Brier score. Permutation feature importance was employed to elucidate the prediction model. The best-performing model was used to conduct the risk scores.</div></div><div><h3>Results</h3><div>The eXtreme Gradient Boosting (XGB) model demonstrated superior performance among CHD prediction models, achieving an AUROC of 0.772 (95 % CI 0.728, 0.817) in the testing dataset and 0.738 (0.699, 0.775) in the external validation dataset. The pivotal predictors (top 3) identified by the model included living in rural areas, the low wealth index, and folic acid supplements (&lt;90 days). The resultant risk score exhibited robust calibration capabilities. Utilizing the risk scores, participants were stratified into low, moderate, and high-risk categories, signifying substantial variations in CHD risk.</div></div><div><h3>Conclusion</h3><div>This study underscores the feasibility and efficacy of employing a machine learning-based approach for CHD prediction. The risk scores exhibited potential in identifying pregnant women at high risk for fetal CHD, offering valuable insights for guiding primary prevention and CHD management.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"195 \",\"pages\":\"Article 105741\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624004040\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624004040","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:目前的先天性心脏病(CHD)预测工具缺乏足够的可解释性和便捷性,阻碍了个性化CHD管理策略的发展。我们开发了一个基于机器学习的冠心病预测风险分层模型。方法:本研究利用2014年1月至2016年12月在中国西北陕西省西安市6家出生缺陷监测医院开展的冠心病病例对照研究中1759名参与者的数据。将数据按7:3的比例划分为训练数据集和测试数据集。通过最小绝对收缩和选择算子(LASSO)从总共47个输入变量中选择预测因子。采用5种机器学习算法建立冠心病风险预测模型。根据一系列学习指标评估模型的性能,包括受试者工作特征曲线下面积(AUROC)、F1评分和Brier评分。利用排列特征重要度来阐明预测模型。采用表现最好的模型进行风险评分。结果:极端梯度增强(eXtreme Gradient Boosting, XGB)模型在冠心病预测模型中表现优异,测试数据集的AUROC为0.772 (95% CI 0.728, 0.817),外部验证数据集的AUROC为0.738(0.699,0.775)。该模型确定的关键预测因素(前3)包括生活在农村地区、低财富指数和叶酸补充剂(结论:本研究强调了采用基于机器学习的方法预测冠心病的可行性和有效性。风险评分显示出识别胎儿冠心病高危孕妇的潜力,为指导初级预防和冠心病管理提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Development of machine learning-based models to predict congenital heart disease: A matched case-control study

Background

The current congenital heart disease (CHD) prediction tools lack adequate interpretability and convenience, hindering the development of personalized CHD management strategies. We developed a machine learning-based risk stratification model for CHD prediction.

Methods

This study utilized data from 1,759 participants in a case-control study of CHD conducted across six birth defects surveillance hospitals located in Xi’an, Shaanxi Province, Northwest China, spanning from January 2014 to December 2016. The data was partitioned into training and testing datasets with a ratio of 7:3. Predictors were selected from a total of 47 input variables through the Least Absolute Shrinkage and Selection Operator (LASSO). Five machine learning algorithms were used to build the CHD risk prediction models. Model performance was assessed based on a range of learning metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, and Brier score. Permutation feature importance was employed to elucidate the prediction model. The best-performing model was used to conduct the risk scores.

Results

The eXtreme Gradient Boosting (XGB) model demonstrated superior performance among CHD prediction models, achieving an AUROC of 0.772 (95 % CI 0.728, 0.817) in the testing dataset and 0.738 (0.699, 0.775) in the external validation dataset. The pivotal predictors (top 3) identified by the model included living in rural areas, the low wealth index, and folic acid supplements (<90 days). The resultant risk score exhibited robust calibration capabilities. Utilizing the risk scores, participants were stratified into low, moderate, and high-risk categories, signifying substantial variations in CHD risk.

Conclusion

This study underscores the feasibility and efficacy of employing a machine learning-based approach for CHD prediction. The risk scores exhibited potential in identifying pregnant women at high risk for fetal CHD, offering valuable insights for guiding primary prevention and CHD management.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
期刊最新文献
Machine learning for predicting outcomes of transcatheter aortic valve implantation: A systematic review AI-driven triage in emergency departments: A review of benefits, challenges, and future directions Predicting cancer survival at different stages: Insights from fair and explainable machine learning approaches The fading structural prominence of explanations in clinical studies Utilization, challenges, and training needs of digital health technologies: Perspectives from healthcare professionals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1