首页 > 最新文献

International Journal of Medical Informatics最新文献

英文 中文
De-identification of clinical data: A systematic review of free text, image and tabular data approaches 临床数据的去识别化:对自由文本、图像和表格数据方法的系统回顾
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-19 DOI: 10.1016/j.ijmedinf.2025.106225
Pedro Faustini , Annabelle McIver , Ryan Sullivan , Mark Dras
Background: The digitisation of healthcare has generated vast amounts of data in various formats, including free-text notes, tabular records and medical images. This data is critical for research and innovation, but often contains sensitive information that must be de-identified to ensure patient privacy and regulatory compliance. Natural Language Processing (NLP) enables automated de-identification of sensitive information to safely share medical datasets.
Objective: This study aims to systematically review the literature on NLP-based de-identification techniques applied to free-text medical reports, tabular data, and burned-in text within medical images over the past decade. It seeks to identify state-of-the-art methods, analyse how de-identification tasks are assessed, and find existing gaps for future research.
Methods: We systematically searched five important databases (PubMed, Web of Science, DBLP, ACM and IEEE) for articles published from January 2015 to December 2024 (10 years) about de-identification of medical data in free text, tabular data and burned-in pixels in images. We filtered the articles based on their titles and abstracts against inclusion and exclusion criteria, followed by a quality filter.
Results: From a set of 734 papers, 83 articles were deemed relevant. Most studies de-identify free text, with a few working with tabular data and a much scarcer number dealing with text embedded in the pixels of the images.
Conclusions: De-identification techniques have evolved, with increased use of Language Models and a decline in recurrence-based neural networks. Off-the-shelf tools often require customisation for optimal performance. Most studies de-identify English content, supported by the prevalence of English datasets. Key challenges include the phenomenon of code-mixing (i.e., more than one language used in the same sentence) and the scarcity of available datasets for reproducibility.
背景:医疗保健的数字化产生了各种格式的大量数据,包括自由文本注释、表格记录和医学图像。这些数据对研究和创新至关重要,但通常包含必须去识别的敏感信息,以确保患者隐私和法规遵从性。自然语言处理(NLP)实现了敏感信息的自动去识别,以安全地共享医疗数据集。目的:本研究旨在系统回顾过去十年来基于nlp的去识别技术在医学图像中应用于自由文本医学报告、表格数据和烧入文本的文献。它试图确定最先进的方法,分析如何评估去识别任务,并为未来的研究找到现有的差距。方法:系统检索5个重要数据库(PubMed、Web of Science、DBLP、ACM和IEEE),检索2015年1月至2024年12月(10年)发表的关于自由文本、表格数据和图像中烧毁像素的医疗数据去识别的文章。我们根据标题和摘要对文章进行筛选,然后进行质量筛选。结果:在734篇论文中,83篇文章被认为是相关的。大多数研究都去识别自由文本,只有少数研究处理表格数据,而处理嵌入图像像素中的文本的研究则少得多。结论:随着语言模型的使用增加和基于递归的神经网络的减少,去识别技术已经发展。现成的工具通常需要定制以获得最佳性能。由于英语数据集的普及,大多数研究都去识别英语内容。主要的挑战包括代码混合现象(即,在同一个句子中使用多种语言)和缺乏可用于再现性的可用数据集。
{"title":"De-identification of clinical data: A systematic review of free text, image and tabular data approaches","authors":"Pedro Faustini ,&nbsp;Annabelle McIver ,&nbsp;Ryan Sullivan ,&nbsp;Mark Dras","doi":"10.1016/j.ijmedinf.2025.106225","DOIUrl":"10.1016/j.ijmedinf.2025.106225","url":null,"abstract":"<div><div><em>Background:</em> The digitisation of healthcare has generated vast amounts of data in various formats, including free-text notes, tabular records and medical images. This data is critical for research and innovation, but often contains sensitive information that must be de-identified to ensure patient privacy and regulatory compliance. Natural Language Processing (NLP) enables automated de-identification of sensitive information to safely share medical datasets.</div><div><em>Objective:</em> This study aims to systematically review the literature on NLP-based de-identification techniques applied to free-text medical reports, tabular data, and burned-in text within medical images over the past decade. It seeks to identify state-of-the-art methods, analyse how de-identification tasks are assessed, and find existing gaps for future research.</div><div><em>Methods:</em> We systematically searched five important databases (PubMed, Web of Science, DBLP, ACM and IEEE) for articles published from January 2015 to December 2024 (10 years) about de-identification of medical data in free text, tabular data and burned-in pixels in images. We filtered the articles based on their titles and abstracts against inclusion and exclusion criteria, followed by a quality filter.</div><div><em>Results:</em> From a set of 734 papers, 83 articles were deemed relevant. Most studies de-identify free text, with a few working with tabular data and a much scarcer number dealing with text embedded in the pixels of the images.</div><div><em>Conclusions:</em> De-identification techniques have evolved, with increased use of Language Models and a decline in recurrence-based neural networks. Off-the-shelf tools often require customisation for optimal performance. Most studies de-identify English content, supported by the prevalence of English datasets. Key challenges include the phenomenon of code-mixing (i.e., more than one language used in the same sentence) and the scarcity of available datasets for reproducibility.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106225"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145841724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physicians’ attitudes toward the patient summary in the Czech Republic: A national cross-sectional survey on awareness, use, and barriers 捷克共和国医生对病人总结的态度:一项关于意识、使用和障碍的全国性横断面调查。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-23 DOI: 10.1016/j.ijmedinf.2025.106232
Petra Hospodková , Jan Bruthans , Adéla Englová

Introduction

The Patient Summary (PS), a standardized subset of the electronic health record is designed to provide essential patient information for use in emergencies, unplanned care, and cross-border healthcare. While its technical development has progressed across Europe, little is known about real-world PS adoption and physician perceptions at the national level. This study explores the awareness, usage, and perceived barriers to the PS adoption among Czech physicians.

Methods

A cross-sectional online survey was distributed to all registered physicians in the Czech Republic between February and March 2025. The questionnaire assessed demographic characteristics, PS usage patterns, perceived benefits and barriers, and alignment with clinical practice. Descriptive statistics were calculated, and non-parametric tests (Wilcoxon rank-sum, Kruskal–Wallis) were used to examine differences by years of experience and medical specialty.

Results

A total of 1,739 responses were received (response rate: 4.14 %). Most respondents (66.4 %) reported not using the PS at all, and 72.1 % were unaware that their electronic medical record could be connected to the National Contact Point for eHealth. Only 1.7 % reported a current connection. There was no significant difference in PS use by years of clinical experience (P = 0.391), but a significant difference was observed across specialties (P < 0.001), with the highest usage reported in intensive care medicine and internal medicine.

Discussion and conclusion

Despite recognized benefits, PS usage remains low in the Czech Republic, largely due to limited awareness and system integration. Targeted policy measures, improved communication, and enhanced digital training are needed to support effective adoption.
简介:患者摘要(PS)是电子健康记录的一个标准化子集,旨在为紧急情况、计划外护理和跨境医疗保健提供必要的患者信息。虽然它的技术发展在整个欧洲都取得了进展,但人们对现实世界中PS的采用和国家层面上医生的看法知之甚少。本研究探讨了意识,使用和感知障碍的PS采用捷克医生。方法:在2025年2月至3月期间,对捷克共和国所有注册医生进行横断面在线调查。调查问卷评估了人口统计学特征、PS使用模式、可感知的益处和障碍,以及与临床实践的一致性。计算描述性统计数据,并使用非参数检验(Wilcoxon秩和,Kruskal-Wallis)来检查经验年限和医学专业的差异。结果:共收到问卷1739份,回复率4.14%。大多数受访者(66.4%)表示根本没有使用PS, 72.1%的人不知道他们的电子病历可以连接到国家电子健康联络点。只有1.7%的人报告有当前连接。不同临床年限对PS的使用差异无统计学意义(P = 0.391),但不同专科对PS的使用差异有统计学意义(P < 0.001),其中重症监护医学和内科使用率最高。讨论和结论:尽管公认的好处,PS使用率仍然很低在捷克共和国,主要是由于有限的认识和系统集成。需要有针对性的政策措施、改进的沟通和加强的数字培训来支持有效采用。
{"title":"Physicians’ attitudes toward the patient summary in the Czech Republic: A national cross-sectional survey on awareness, use, and barriers","authors":"Petra Hospodková ,&nbsp;Jan Bruthans ,&nbsp;Adéla Englová","doi":"10.1016/j.ijmedinf.2025.106232","DOIUrl":"10.1016/j.ijmedinf.2025.106232","url":null,"abstract":"<div><h3>Introduction</h3><div>The Patient Summary (PS), a standardized subset of the electronic health record is designed to provide essential patient information for use in emergencies, unplanned care, and cross-border healthcare. While its technical development has progressed across Europe, little is known about real-world PS adoption and physician perceptions at the national level. This study explores the awareness, usage, and perceived barriers to the PS adoption among Czech physicians.</div></div><div><h3>Methods</h3><div>A cross-sectional online survey was distributed to all registered physicians in the Czech Republic between February and March 2025. The questionnaire assessed demographic characteristics, PS usage patterns, perceived benefits and barriers, and alignment with clinical practice. Descriptive statistics were calculated, and non-parametric tests (Wilcoxon rank-sum, Kruskal–Wallis) were used to examine differences by years of experience and medical specialty.</div></div><div><h3>Results</h3><div>A total of 1,739 responses were received (response rate: 4.14 %). Most respondents (66.4 %) reported not using the PS at all, and 72.1 % were unaware that their electronic medical record could be connected to the National Contact Point for eHealth. Only 1.7 % reported a current connection. There was no significant difference in PS use by years of clinical experience (P = 0.391), but a significant difference was observed across specialties (P &lt; 0.001), with the highest usage reported in intensive care medicine and internal medicine.</div></div><div><h3>Discussion and conclusion</h3><div>Despite recognized benefits, PS usage remains low in the Czech Republic, largely due to limited awareness and system integration. Targeted policy measures, improved communication, and enhanced digital training are needed to support effective adoption.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106232"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The role of digital twin technology in modern emergency care 数字孪生技术在现代急救中的作用
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-19 DOI: 10.1016/j.ijmedinf.2025.106229
David B. Olawade , Osazuwa Ighodaro , Emmanuel Oghenetejiri Erhieyovwe , Nebere Elias Hankamo , Ismail Tajudeen Hamza , Claret Chinenyenwa Analikwu

Background

Emergency care is operationally defined as time-critical acute care across pre-hospital services, emergency departments, and critical care units (excluding routine urgent care and elective admissions), demanding rapid decision-making under pressure. Digital twin technology, creating real-time virtual replicas through continuous data integration, represents a transformative shift in managing acute conditions, resource allocation, and outcome prediction in emergency medicine.

Aim

This review examines the current applications, benefits, challenges, and future directions of digital twin technology in emergency care and medicine, highlighting its potential to revolutionise emergency healthcare delivery.

Method

A comprehensive narrative literature review was conducted using PubMed, IEEE Xplore, Scopus, and Web of Science databases. Studies published between January 2015 and June 2025 focusing on digital twin applications in emergency departments, trauma care, critical care, and prehospital emergency services were included. Grey literature, conference proceedings, and technical reports were also reviewed to capture emerging developments.

Results

Digital twins demonstrate significant utility across multiple emergency care domains including patient monitoring, resource allocation, workflow optimisation, predictive analytics, and training simulations. Key applications include real time patient condition prediction, emergency department capacity management, trauma response coordination, and personalised treatment planning. Despite promising outcomes, implementation challenges persist, including data integration complexities, computational requirements, and regulatory considerations.

Conclusion

Digital twin technology holds substantial promise for enhancing emergency care delivery through improved decision support, resource optimisation, and predictive capabilities. Continued research, standardisation efforts, and interdisciplinary collaboration are essential for successful clinical integration and widespread adoption.
急诊护理在操作上被定义为院前服务、急诊科和重症监护病房(不包括常规急诊护理和选择性住院)的时间紧迫的急性护理,要求在压力下快速决策。数字孪生技术通过持续的数据集成创建实时虚拟副本,代表了急诊医学在急症管理、资源分配和结果预测方面的革命性转变。本综述探讨了数字孪生技术在急诊护理和医学中的当前应用、益处、挑战和未来方向,强调了其革命性的急诊医疗服务的潜力。方法采用PubMed、IEEE explore、Scopus、Web of Science等数据库进行综合叙述性文献综述。2015年1月至2025年6月期间发表的研究重点是数字双胞胎在急诊科、创伤护理、重症监护和院前急救服务中的应用。还审查了灰色文献、会议记录和技术报告,以捕捉新的发展。结果数字孪生在多个急诊护理领域展示了重要的实用性,包括患者监测、资源分配、工作流程优化、预测分析和培训模拟。主要应用包括实时患者病情预测、急诊科能力管理、创伤反应协调和个性化治疗计划。尽管取得了可喜的成果,但实施方面的挑战依然存在,包括数据集成的复杂性、计算需求和监管方面的考虑。结论数字孪生技术通过改进决策支持、资源优化和预测能力,在加强急诊护理服务方面具有重要前景。持续的研究、标准化工作和跨学科合作对于成功的临床整合和广泛采用至关重要。
{"title":"The role of digital twin technology in modern emergency care","authors":"David B. Olawade ,&nbsp;Osazuwa Ighodaro ,&nbsp;Emmanuel Oghenetejiri Erhieyovwe ,&nbsp;Nebere Elias Hankamo ,&nbsp;Ismail Tajudeen Hamza ,&nbsp;Claret Chinenyenwa Analikwu","doi":"10.1016/j.ijmedinf.2025.106229","DOIUrl":"10.1016/j.ijmedinf.2025.106229","url":null,"abstract":"<div><h3>Background</h3><div>Emergency care is operationally defined as time-critical acute care across pre-hospital services, emergency departments, and critical care units (excluding routine urgent care and elective admissions), demanding rapid decision-making under pressure. Digital twin technology, creating real-time virtual replicas through continuous data integration, represents a transformative shift in managing acute conditions, resource allocation, and outcome prediction in emergency medicine.</div></div><div><h3>Aim</h3><div>This review examines the current applications, benefits, challenges, and future directions of digital twin technology in emergency care and medicine, highlighting its potential to revolutionise emergency healthcare delivery.</div></div><div><h3>Method</h3><div>A comprehensive narrative literature review was conducted using PubMed, IEEE Xplore, Scopus, and Web of Science databases. Studies published between January 2015 and June 2025 focusing on digital twin applications in emergency departments, trauma care, critical care, and prehospital emergency services were included. Grey literature, conference proceedings, and technical reports were also reviewed to capture emerging developments.</div></div><div><h3>Results</h3><div>Digital twins demonstrate significant utility across multiple emergency care domains including patient monitoring, resource allocation, workflow optimisation, predictive analytics, and training simulations. Key applications include real time patient condition prediction, emergency department capacity management, trauma response coordination, and personalised treatment planning. Despite promising outcomes, implementation challenges persist, including data integration complexities, computational requirements, and regulatory considerations.</div></div><div><h3>Conclusion</h3><div>Digital twin technology holds substantial promise for enhancing emergency care delivery through improved decision support, resource optimisation, and predictive capabilities. Continued research, standardisation efforts, and interdisciplinary collaboration are essential for successful clinical integration and widespread adoption.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106229"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145799857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A clinical-AI correlation for integrating artificial intelligence into stroke care: a systematized literature review and practice framework 将人工智能整合到中风治疗中的临床与人工智能的相关性:系统化的文献综述和实践框架。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-21 DOI: 10.1016/j.ijmedinf.2025.106233
João Brainer Clares de Andrade , Thiago S. Carneiro , George N. Nunes Mendes , Joao Pedro Nardari dos Santos , Jussie Correia Lima

Background and purpose

The rapid integration of artificial intelligence (AI) into stroke care has outpaced many clinicians’ ability to critically evaluate and safely implement these tools. We conducted a systematized literature review and developed a practical framework to guide neurologists in the responsible integration of AI into stroke practice.

Methods

We performed a systematized review of PubMed, EMBASE, and gray literature (January 2018-June 2025) following adapted PRISMA guidelines. Search strategies combined AI-related terms with stroke care concepts. We assessed risk of bias using QUADAS-2, RoB 2, and ROBINS-I tools. Expert consultation with stroke neurologists and AI developers informed framework development.

Results

From 8,635 identified records, 152 studies met inclusion criteria (47 in quantitative synthesis). AI applications spanned large vessel occlusion detection (30 %), ASPECTS scoring (21 %), outcome prediction (18 %), hemorrhage detection (15 %), and treatment selection (16 %). Only 23% of studies showed low risk of bias, with main concerns including selection bias (29 %), confounding (38 %), and limited external validation (8 % prospective validation). The Clinical-AI Correlation Framework emphasizes three pillars: (1) problem identification and tool selection, (2) clinical correlation using Bayesian reasoning and topographic pattern recognition, and (3) continuous feedback and quality improvement.

Conclusions

Safe AI integration in stroke care requires structured clinical correlation, robust governance frameworks, and continuous monitoring. Our framework provides practical guidance for maintaining clinical judgment while leveraging AI capabilities, emphasizing human oversight for high-risk decisions and systematic documentation of AI-clinician interactions.
背景和目的:人工智能(AI)与中风治疗的快速整合已经超过了许多临床医生批判性评估和安全实施这些工具的能力。我们进行了系统的文献综述,并制定了一个实用框架,以指导神经科医生负责地将人工智能整合到中风实践中。方法:我们根据改编的PRISMA指南,对PubMed、EMBASE和灰色文献(2018年1月至2025年6月)进行了系统回顾。将人工智能相关术语与中风护理概念相结合的搜索策略。我们使用QUADAS-2、rob2和ROBINS-I工具评估偏倚风险。与中风神经科医生和人工智能开发人员的专家咨询为框架的开发提供了信息。结果:在8635份确定的记录中,152项研究符合纳入标准(47项为定量综合)。人工智能的应用范围包括大血管闭塞检测(30%)、ASPECTS评分(21%)、结果预测(18%)、出血检测(15%)和治疗选择(16%)。只有23%的研究显示低偏倚风险,主要问题包括选择偏倚(29%)、混淆(38%)和有限的外部验证(8%的前瞻性验证)。临床-人工智能相关框架强调三个支柱:(1)问题识别和工具选择;(2)使用贝叶斯推理和地形模式识别的临床相关性;(3)持续反馈和质量改进。结论:人工智能在卒中治疗中的安全整合需要结构化的临床相关性、健全的治理框架和持续监测。我们的框架为维持临床判断提供了实用指导,同时利用人工智能能力,强调人类对高风险决策的监督,并系统地记录人工智能与临床医生的互动。
{"title":"A clinical-AI correlation for integrating artificial intelligence into stroke care: a systematized literature review and practice framework","authors":"João Brainer Clares de Andrade ,&nbsp;Thiago S. Carneiro ,&nbsp;George N. Nunes Mendes ,&nbsp;Joao Pedro Nardari dos Santos ,&nbsp;Jussie Correia Lima","doi":"10.1016/j.ijmedinf.2025.106233","DOIUrl":"10.1016/j.ijmedinf.2025.106233","url":null,"abstract":"<div><h3>Background and purpose</h3><div>The rapid integration of artificial intelligence (AI) into stroke care has outpaced many clinicians’ ability to critically evaluate and safely implement these tools. We conducted a systematized literature review and developed a practical framework to guide neurologists in the responsible integration of AI into stroke practice.</div></div><div><h3>Methods</h3><div>We performed a systematized review of PubMed, EMBASE, and gray literature (January 2018-June 2025) following adapted PRISMA guidelines. Search strategies combined AI-related terms with stroke care concepts. We assessed risk of bias using QUADAS-2, RoB 2, and ROBINS-I tools. Expert consultation with stroke neurologists and AI developers informed framework development.</div></div><div><h3>Results</h3><div>From 8,635 identified records, 152 studies met inclusion criteria (47 in quantitative synthesis). AI applications spanned large vessel occlusion detection (30 %), ASPECTS scoring (21 %), outcome prediction (18 %), hemorrhage detection (15 %), and treatment selection (16 %). Only 23% of studies showed low risk of bias, with main concerns including selection bias (29 %), confounding (38 %), and limited external validation (8 % prospective validation). The Clinical-AI Correlation Framework emphasizes three pillars: (1) problem identification and tool selection, (2) clinical correlation using Bayesian reasoning and topographic pattern recognition, and (3) continuous feedback and quality improvement.</div></div><div><h3>Conclusions</h3><div>Safe AI integration in stroke care requires structured clinical correlation, robust governance frameworks, and continuous monitoring. Our framework provides practical guidance for maintaining clinical judgment while leveraging AI capabilities, emphasizing human oversight for high-risk decisions and systematic documentation of AI-clinician interactions.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106233"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Machine learning in predicting cancer complications using longitudinal Data: A systematic review and Meta-Analysis 机器学习在使用纵向数据预测癌症并发症中的应用:系统回顾和荟萃分析
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-05 DOI: 10.1016/j.ijmedinf.2025.106217
Abu Sarwar Zamani , Abdelwahed Motwakel Eltayeb , Adel Alluhayb , Md.Mobin Akhtar , Rashid Ayub , Mohammed Abdelmonem Ahmed Abdelrahim , Sara Saadeldeen Ibrahim Mohamed , Naved Ahmad
Cancer prognosis of complications like metastasis, recurrence, and side effects of treatments is important to enhance patient prognosis. There is great potential in the use of ML on lifetime data for improving prediction accuracy in oncology; however, there is no systematic review of the subject. This SRMA is intended to assess the accuracy of ML models based on longitudinal studies for the estimation of cancer-related complications. The articles were identified from PubMed, Google Scholar, and IEEE Xplore databases for the years 2020 to 2024. Seven of the studies reviewed in the paper analyzed ML models that employed longitudinal data for cancer complication prognosis. The risk of bias of included studies was assessed using the Cochrane Risk of Bias tool, and for diagnostic accuracy, the QUADES 2 tool was used. Information on ML techniques, prediction accuracy, and results was obtained. The pooled area under the curve (AUC) for immune-related adverse events prediction was 0.78 (95% CI: 0.73–0.83). For cancer recurrence and mortality prediction, pooled AUCs ranged from 0.70 to 0.75. Machine learning models integrating clinical, genomic, and imaging data demonstrated superior predictive accuracy across various cancer types. Models predicting quality of life deterioration during treatment showed an AUC of 0.82. ML models applying longitudinal data effectively predict cancer complications with improved accuracy when integrating multimodal data. These models offer promising tools for clinical decision-making in oncology.
肿瘤转移、复发、治疗副作用等并发症的预后是提高患者预后的重要因素。将机器学习应用于生命周期数据,在提高肿瘤预测准确性方面具有很大的潜力;然而,目前还没有对这一课题进行系统的综述。本SRMA旨在评估基于纵向研究的ML模型用于估计癌症相关并发症的准确性。这些文章来自PubMed、b谷歌Scholar和IEEE explore数据库,时间为2020年至2024年。本文回顾的7项研究分析了采用纵向数据预测癌症并发症预后的ML模型。使用Cochrane偏倚风险工具评估纳入研究的偏倚风险,并使用QUADES 2工具评估诊断准确性。获得了关于机器学习技术、预测准确性和结果的信息。预测免疫相关不良事件的合并曲线下面积(AUC)为0.78 (95% CI: 0.73-0.83)。对于癌症复发和死亡率预测,汇总auc范围为0.70至0.75。整合临床、基因组和成像数据的机器学习模型在各种癌症类型中显示出卓越的预测准确性。预测治疗期间生活质量恶化的模型显示AUC为0.82。当整合多模态数据时,应用纵向数据的ML模型可以有效地预测癌症并发症,提高准确性。这些模型为肿瘤学的临床决策提供了有前途的工具。
{"title":"Application of Machine learning in predicting cancer complications using longitudinal Data: A systematic review and Meta-Analysis","authors":"Abu Sarwar Zamani ,&nbsp;Abdelwahed Motwakel Eltayeb ,&nbsp;Adel Alluhayb ,&nbsp;Md.Mobin Akhtar ,&nbsp;Rashid Ayub ,&nbsp;Mohammed Abdelmonem Ahmed Abdelrahim ,&nbsp;Sara Saadeldeen Ibrahim Mohamed ,&nbsp;Naved Ahmad","doi":"10.1016/j.ijmedinf.2025.106217","DOIUrl":"10.1016/j.ijmedinf.2025.106217","url":null,"abstract":"<div><div>Cancer prognosis of complications like metastasis, recurrence, and side effects of treatments is important to enhance patient prognosis. There is great potential in the use of ML on lifetime data for improving prediction accuracy in oncology; however, there is no systematic review of the subject. This SRMA is intended to assess the accuracy of ML models based on longitudinal studies for the estimation of cancer-related complications. The articles were identified from PubMed, Google Scholar, and IEEE Xplore databases for the years 2020 to 2024. Seven of the studies reviewed in the paper analyzed ML models that employed longitudinal data for cancer complication prognosis. The risk of bias of included studies was assessed using the Cochrane Risk of Bias tool, and for diagnostic accuracy, the QUADES 2 tool was used. Information on ML techniques, prediction accuracy, and results was obtained. The pooled area under the curve (AUC) for immune-related adverse events prediction was 0.78 (95% CI: 0.73–0.83). For cancer recurrence and mortality prediction, pooled AUCs ranged from 0.70 to 0.75. Machine learning models integrating clinical, genomic, and imaging data demonstrated superior predictive accuracy across various cancer types. Models predicting quality of life deterioration during treatment showed an AUC of 0.82. ML models applying longitudinal data effectively predict cancer complications with improved accuracy when integrating multimodal data. These models offer promising tools for clinical decision-making in oncology.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106217"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145799859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph attention network with comorbidity connectivity embedding for post-traumatic epilepsy risk prediction using sparse time-series electronic health records 带共病连通性嵌入的图注意网络用于稀疏时间序列电子病历创伤后癫痫风险预测
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-23 DOI: 10.1016/j.ijmedinf.2025.106239
Priyadharsini Ramamurthy , Zheng Han , Dursun Delen , Zhuqi Miao , Andrew Gin , Xiao Luo , William Paiva

Background

Traumatic brain injury (TBI) is a major risk factor for neurological disorders, including post-traumatic epilepsy (PTE), a debilitating condition associated with significant long-term consequences. The prognosis of PTE occurrence remains challenging due to the complex pathophysiology of PTE and the impracticality of traditional blood biomarker- or imaging-based screening for large populations. This study proposes a graph-based deep learning approach that leverages electronic health records (EHR) to enhance the predictive assessment of PTE risk.

Methods

We utilized Oracle Real-World Data (ORWD) to construct a Heterogeneous Graph Attention Network (HeteroGAT) that contains patient and diagnosis nodes, with temporal information represented using patients-to-diagnosis edges, and comorbidity connectivity embedded using diagnosis-to-diagnosis edges. The HeteroGAT was trained on a cohort of 1,598,998 TBI-only patients and 102,687 individuals who developed epilepsy after TBI. Model performance was evaluated using sensitivity, specificity, macro F1-score, and area under the receiver operating characteristic curve (AUC-ROC), benchmarked against traditional machine learning models. Attention scores of nodes were used to evaluate node importance. The capabilities of the HeteroGATs trained to differentiate early vs late PTE patients following TBI were also assessed.

Results

HeteroGAT significantly outperformed conventional models in PTE prediction by effectively integrating demographic data and comorbidity profiles spanning from 20 to 500 distinct conditions. The model’s multi-head attention mechanisms, in combination with learned comorbidity connectivity, enhanced its ability to capture complex dependencies within EHR data. HeteroGAT achieved an AUC-ROC of 0.80, outperforming the best-performing traditional model, random forest (AUC-ROC = 0.77). HeteroGAT also demonstrated capabilities in differentiating early and late PTEs. Ranking of nodes based on attention scores also identified predictors of PTE that are clinically relevant.

Conclusion

By modeling sparse EHR data through patient encounter embeddings, HeteroGAT effectively captures temporal and relational patterns in comorbidities critical for PTE prediction. Our findings highlight the potential of graph-based deep learning models, synergized with large-scale EHR data, in advancing personalized risk assessment, ultimately addressing the urgent need for more precise and proactive management of PTE in TBI patients.
背景外伤性脑损伤(TBI)是神经系统疾病的主要危险因素,包括创伤后癫痫(PTE),这是一种与严重的长期后果相关的衰弱性疾病。由于PTE的病理生理复杂,以及传统的基于血液生物标志物或影像学筛查在大人群中的不实用性,PTE的预后仍然具有挑战性。本研究提出了一种基于图形的深度学习方法,该方法利用电子健康记录(EHR)来增强PTE风险的预测评估。方法利用Oracle真实世界数据(ORWD)构建了包含患者和诊断节点的异构图关注网络(HeteroGAT),其中时间信息使用患者到诊断边表示,共病连接使用诊断到诊断边嵌入。HeteroGAT在1,598,998名TBI患者和102,687名TBI后发生癫痫的患者中进行了培训。以传统机器学习模型为基准,使用灵敏度、特异性、宏观f1评分和受试者工作特征曲线下面积(AUC-ROC)来评估模型的性能。节点的注意分数用于评价节点的重要性。此外,还评估了经过训练的heterogat区分TBI后早期和晚期PTE患者的能力。结果通过有效整合人口统计数据和20 - 500种不同疾病的共病概况,sheterogat在PTE预测方面明显优于传统模型。该模型的多头注意机制与习得的共病连接相结合,增强了其捕获EHR数据中复杂依赖关系的能力。HeteroGAT的AUC-ROC为0.80,优于表现最好的传统模型随机森林(AUC-ROC = 0.77)。此外,HeteroGAT还具有区分早期和晚期pte的能力。基于注意力得分的节点排名也确定了临床相关的PTE预测因子。结论:通过患者遭遇嵌入对稀疏的EHR数据进行建模,HeteroGAT可以有效捕获对PTE预测至关重要的合并症的时间和关系模式。我们的研究结果强调了基于图的深度学习模型与大规模电子病历数据协同的潜力,在推进个性化风险评估方面,最终解决了对TBI患者PTE更精确和主动管理的迫切需求。
{"title":"Graph attention network with comorbidity connectivity embedding for post-traumatic epilepsy risk prediction using sparse time-series electronic health records","authors":"Priyadharsini Ramamurthy ,&nbsp;Zheng Han ,&nbsp;Dursun Delen ,&nbsp;Zhuqi Miao ,&nbsp;Andrew Gin ,&nbsp;Xiao Luo ,&nbsp;William Paiva","doi":"10.1016/j.ijmedinf.2025.106239","DOIUrl":"10.1016/j.ijmedinf.2025.106239","url":null,"abstract":"<div><h3>Background</h3><div>Traumatic brain injury (TBI) is a major risk factor for neurological disorders, including post-traumatic epilepsy (PTE), a debilitating condition associated with significant long-term consequences. The prognosis of PTE occurrence remains challenging due to the complex pathophysiology of PTE and the impracticality of traditional blood biomarker- or imaging-based screening for large populations. This study proposes a graph-based deep learning approach that leverages electronic health records (EHR) to enhance the predictive assessment of PTE risk.</div></div><div><h3>Methods</h3><div>We utilized Oracle Real-World Data (ORWD) to construct a Heterogeneous Graph Attention Network (HeteroGAT) that contains patient and diagnosis nodes, with temporal information represented using patients-to-diagnosis edges, and comorbidity connectivity embedded using diagnosis-to-diagnosis edges. The HeteroGAT was trained on a cohort of 1,598,998 TBI-only patients and 102,687 individuals who developed epilepsy after TBI. Model performance was evaluated using sensitivity, specificity, macro F1-score, and area under the receiver operating characteristic curve (AUC-ROC), benchmarked against traditional machine learning models. Attention scores of nodes were used to evaluate node importance. The capabilities of the HeteroGATs trained to differentiate early vs late PTE patients following TBI were also assessed.</div></div><div><h3>Results</h3><div>HeteroGAT significantly outperformed conventional models in PTE prediction by effectively integrating demographic data and comorbidity profiles spanning from 20 to 500 distinct conditions. The model’s multi-head attention mechanisms, in combination with learned comorbidity connectivity, enhanced its ability to capture complex dependencies within EHR data. HeteroGAT achieved an AUC-ROC of 0.80, outperforming the best-performing traditional model, random forest (AUC-ROC = 0.77). HeteroGAT also demonstrated capabilities in differentiating early and late PTEs. Ranking of nodes based on attention scores also identified predictors of PTE that are clinically relevant.</div></div><div><h3>Conclusion</h3><div>By modeling sparse EHR data through patient encounter embeddings, HeteroGAT effectively captures temporal and relational patterns in comorbidities critical for PTE prediction. Our findings highlight the potential of graph-based deep learning models, synergized with large-scale EHR data, in advancing personalized risk assessment, ultimately addressing the urgent need for more precise and proactive management of PTE in TBI patients.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106239"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensemble machine learning for early mortality risk stratification in septic orthopedic trauma: an international cohort study 集成机器学习用于感染性骨科创伤早期死亡风险分层:一项国际队列研究
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-23 DOI: 10.1016/j.ijmedinf.2025.106240
Jun Guo , Fan Xiong , Baisheng Sun , Mingxing Lei , Yong Qin

Background

Sepsis represents a life-threatening complication in severe orthopedic trauma, significantly increasing short-term mortality risk. Despite the clinical urgency for early prognosis assessment, current predictive tools remain inadequate. To address this gap, this study used a machine learning (ML)-based framework for mortality risk stratification in this high-risk population.

Methods

This retrospective cohort study established ML models to predict 30-day all-cause mortality in critically ill patients with orthopedic trauma and sepsis. Data from 2,060 eligible patients were extracted from the intensive care unit (ICU) of Beth Israel Deaconess Medical Center (2008–2019) in the United State and randomly split into training (80 %) and internal validation (20 %) sets. After handling missing data and addressing class imbalance, seven ML algorithms (including CatBoost [Categorical Boosting], RF [Random Forest], and SVM [Support Vector Machine]) were trained and optimized using 10-fold cross-validation. Model performance was assessed based on discrimination (AUC [Area Under the Curve], accuracy, F1-score), calibration (Brier score, calibration slope), and clinical utility. The top-performing models were further validated on an independent external Chinese cohort (n = 273, 2020–2024).

Results

The study cohort had a mean age of 62.8 years and a 30-day mortality rate of 19.9 % (410/2060). Non-survivors were significantly older, had a higher comorbidity burden, and more severe physiological derangements. The LASSO analysis identified 16 prognostic variables, with age, hematologic parameters (RDW, WBC), SOFA scores, hemodynamic measures (SBP), and antihypertensive therapy emerging as significant predictors. Among all models, the CatBoost algorithm demonstrated superior performance in the internal validation set, achieving the highest AUC (0.955), accuracy (0.884), and F1-score (0.878), along with excellent calibration (Brier score: 0.081). A soft voting ensemble model, integrating the top three algorithms (CatBoost, RF, SVM), was subsequently constructed. In external validation, this ensemble model generalized robustly, maintaining strong discrimination (AUC: 0.842, Accuracy: 0.737) and calibration (Brier score: 0.173), outperforming the standalone CatBoost model. SHapley Additive exPlanations analysis provided interpretable, individualized risk assessments.

Conclusions

This study trains, optimizes, and evaluates a high-performing ML-based prediction model for 30-day mortality in patients with critical orthopedic trauma and sepsis. The CatBoost model and the soft voting ensemble, particularly the latter, demonstrates strong generalizability and clinical utility, offering a potential tool for early risk stratification and personalized management in this vulnerable population.
脓毒症是严重骨科创伤中一种危及生命的并发症,显著增加短期死亡风险。尽管临床迫切需要早期预后评估,但目前的预测工具仍然不足。为了解决这一差距,本研究在这一高危人群中使用了基于机器学习(ML)的死亡率风险分层框架。方法回顾性队列研究建立ML模型,预测骨科创伤合并脓毒症危重患者30天全因死亡率。从美国贝斯以色列女执事医疗中心(Beth Israel Deaconess Medical Center)重症监护室(ICU)提取2060例符合条件的患者数据(2008-2019),随机分为训练组(80%)和内部验证组(20%)。在处理缺失数据和解决类不平衡问题后,使用10倍交叉验证对七种ML算法(包括CatBoost [Categorical Boosting], RF [Random Forest]和SVM [Support Vector Machine])进行了训练和优化。模型性能评估基于鉴别(AUC[曲线下面积],准确性,f1评分),校准(Brier评分,校准斜率)和临床实用性。在一个独立的外部中国队列(n = 273, 2020-2024)上进一步验证了表现最好的模型。结果研究队列的平均年龄为62.8岁,30天死亡率为19.9%(410/2060)。非幸存者明显更老,有更高的合并症负担,更严重的生理紊乱。LASSO分析确定了16个预后变量,其中年龄、血液学参数(RDW、WBC)、SOFA评分、血流动力学测量(SBP)和抗高血压治疗成为重要的预测因素。在所有模型中,CatBoost算法在内部验证集中表现优异,AUC(0.955)、准确率(0.884)和f1评分(0.878)最高,校准效果也很好(Brier评分:0.081)。随后构建了一个软投票集成模型,该模型集成了前三种算法(CatBoost、RF、SVM)。在外部验证中,该集成模型具有鲁棒性泛化,保持了较强的判别性(AUC: 0.842,准确度:0.737)和校准性(Brier评分:0.173),优于独立的CatBoost模型。SHapley加性解释分析提供了可解释的、个性化的风险评估。本研究训练、优化并评估了一种高性能的基于ml的骨科创伤和脓毒症患者30天死亡率预测模型。CatBoost模型和软投票集合,特别是后者,显示出很强的通用性和临床实用性,为这一弱势群体的早期风险分层和个性化管理提供了潜在的工具。
{"title":"Ensemble machine learning for early mortality risk stratification in septic orthopedic trauma: an international cohort study","authors":"Jun Guo ,&nbsp;Fan Xiong ,&nbsp;Baisheng Sun ,&nbsp;Mingxing Lei ,&nbsp;Yong Qin","doi":"10.1016/j.ijmedinf.2025.106240","DOIUrl":"10.1016/j.ijmedinf.2025.106240","url":null,"abstract":"<div><h3>Background</h3><div>Sepsis represents a life-threatening complication in severe orthopedic trauma, significantly increasing short-term mortality risk. Despite the clinical urgency for early prognosis assessment, current predictive tools remain inadequate. To address this gap, this study used a machine learning (ML)-based framework for mortality risk stratification in this high-risk population.</div></div><div><h3>Methods</h3><div>This retrospective cohort study established ML models to predict 30-day all-cause mortality in critically ill patients with orthopedic trauma and sepsis. Data from 2,060 eligible patients were extracted from the intensive care unit (ICU) of Beth Israel Deaconess Medical Center (2008–2019) in the United State and randomly split into training (80 %) and internal validation (20 %) sets. After handling missing data and addressing class imbalance, seven ML algorithms (including CatBoost [Categorical Boosting], RF [Random Forest], and SVM [Support Vector Machine]) were trained and optimized using 10-fold cross-validation. Model performance was assessed based on discrimination (AUC [Area Under the Curve], accuracy, F1-score), calibration (Brier score, calibration slope), and clinical utility. The top-performing models were further validated on an independent external Chinese cohort (n = 273, 2020–2024).</div></div><div><h3>Results</h3><div>The study cohort had a mean age of 62.8 years and a 30-day mortality rate of 19.9 % (410/2060). Non-survivors were significantly older, had a higher comorbidity burden, and more severe physiological derangements. The LASSO analysis identified 16 prognostic variables, with age, hematologic parameters (RDW, WBC), SOFA scores, hemodynamic measures (SBP), and antihypertensive therapy emerging as significant predictors. Among all models, the CatBoost algorithm demonstrated superior performance in the internal validation set, achieving the highest AUC (0.955), accuracy (0.884), and F1-score (0.878), along with excellent calibration (Brier score: 0.081). A soft voting ensemble model, integrating the top three algorithms (CatBoost, RF, SVM), was subsequently constructed. In external validation, this ensemble model generalized robustly, maintaining strong discrimination (AUC: 0.842, Accuracy: 0.737) and calibration (Brier score: 0.173), outperforming the standalone CatBoost model. SHapley Additive exPlanations analysis provided interpretable, individualized risk assessments.</div></div><div><h3>Conclusions</h3><div>This study trains, optimizes, and evaluates a high-performing ML-based prediction model for 30-day mortality in patients with critical orthopedic trauma and sepsis. The CatBoost model and the soft voting ensemble, particularly the latter, demonstrates strong generalizability and clinical utility, offering a potential tool for early risk stratification and personalized management in this vulnerable population.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106240"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of a bleeding risk model for off-pump coronary artery bypass grafting: a multi-center retrospective cohort study 非体外循环冠状动脉旁路移植术出血风险模型的建立和验证:一项多中心回顾性队列研究
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-16 DOI: 10.1016/j.ijmedinf.2025.106226
Zi Wang , Runhua Ma , Qiming Wang , Fan Yang , Xiaotong Xia , Xiaoyu Li , Qing Xu , Yao yao , Hongyi Wu , Chunsheng Wang , Qianzhou Lv

Background

Perioperative bleeding is a major challenge in coronary artery bypass grafting (CABG). Existing bleeding risk models often lack specificity for off-pump CABG (OPCABG) patients.

Objective

This study aims to develop and validate a novel perioperative bleeding prediction model tailored for OPCABG patients.

Methods

This retrospective, multi-center cohort study was conducted using both internal and external validation cohorts. Fourteen different models, including Binary Logistic Regression, Random Forest, Decision Tree, Extra Trees, Adaptive Boosting, Extreme Gradient Boosting, Categorical Boosting, Gradient Boosting, Naive Bayes, Artificial Neural Network, Light Gradient Boosting Machine, K-nearest Neighbors, Support Vector Machine, and LogitBoost, were applied for model development. SHapley Additive exPlanations (SHAP) were used to interpret feature importance and the model’s outputs.

Results

The final model, CABG Bleeding Risk of 10 Variables (CABG-BR10), was built using Random Rorest. This model identified 10 key variables: antiplatelet drug discontinuation, N-terminal pro B-type natriuretic peptide, activated partial thromboplastin time, hemoglobin, urea, cardiac troponin T, estimated glomerular filtration rate, total bilirubin, fibrinogen, and international normalized ratio. In the internal and external validation cohorts, the model demonstrated solid performance with Receiver Operating Characteristic − Area Under the Curve values of 0.90 and 0.87, and Precision-Recall − Area Under the Curve values of 0.70 and 0.67, respectively. SHAP analysis identified key predictors of bleeding risk, and an online tool was developed to facilitate bleeding risk assessment.

Conclusion

The CABG-BR10 model accurately predicts perioperative bleeding risk in OPCABG patients, outperforming traditional scoring systems and providing interpretable, clinically relevant insights into bleeding risk factors.
背景:围手术期出血是冠状动脉旁路移植术(CABG)的主要挑战。现有的出血风险模型对非体外循环CABG (OPCABG)患者往往缺乏特异性。目的建立并验证一种适合OPCABG患者的围手术期出血预测模型。方法采用内部和外部验证队列进行回顾性、多中心队列研究。采用了二元逻辑回归、随机森林、决策树、额外树、自适应增强、极端梯度增强、分类增强、梯度增强、朴素贝叶斯、人工神经网络、轻梯度增强机、k近邻、支持向量机和LogitBoost等14种不同的模型进行模型开发。SHapley加性解释(SHAP)用于解释特征的重要性和模型的输出。结果采用随机抽样方法建立CABG- br10 (CABG- br10)模型。该模型确定了10个关键变量:抗血小板药物停药、n端前b型利钠肽、活化部分凝血活蛋白时间、血红蛋白、尿素、心肌肌钙蛋白T、估计肾小球滤过率、总胆红素、纤维蛋白原和国际标准化比率。在内部和外部验证队列中,该模型表现出良好的性能,接收者工作特征-曲线下面积值分别为0.90和0.87,精确召回率-曲线下面积值分别为0.70和0.67。SHAP分析确定了出血风险的关键预测因素,并开发了一个在线工具来促进出血风险评估。结论CABG-BR10模型可准确预测OPCABG患者围手术期出血风险,优于传统评分系统,为出血危险因素提供可解释的、临床相关的见解。
{"title":"Development and validation of a bleeding risk model for off-pump coronary artery bypass grafting: a multi-center retrospective cohort study","authors":"Zi Wang ,&nbsp;Runhua Ma ,&nbsp;Qiming Wang ,&nbsp;Fan Yang ,&nbsp;Xiaotong Xia ,&nbsp;Xiaoyu Li ,&nbsp;Qing Xu ,&nbsp;Yao yao ,&nbsp;Hongyi Wu ,&nbsp;Chunsheng Wang ,&nbsp;Qianzhou Lv","doi":"10.1016/j.ijmedinf.2025.106226","DOIUrl":"10.1016/j.ijmedinf.2025.106226","url":null,"abstract":"<div><h3>Background</h3><div>Perioperative bleeding is a major challenge in coronary artery bypass grafting (CABG). Existing bleeding risk models often lack specificity for off-pump CABG (OPCABG) patients.</div></div><div><h3>Objective</h3><div>This study aims to develop and validate a novel perioperative bleeding prediction model tailored for OPCABG patients.</div></div><div><h3>Methods</h3><div>This retrospective, multi-center cohort study was conducted using both internal and external validation cohorts. Fourteen different models, including Binary Logistic Regression, Random Forest, Decision Tree, Extra Trees, Adaptive Boosting, Extreme Gradient Boosting, Categorical Boosting, Gradient Boosting, Naive Bayes, Artificial Neural Network, Light Gradient Boosting Machine, K-nearest Neighbors, Support Vector Machine, and LogitBoost, were applied for model development. SHapley Additive exPlanations (SHAP) were used to interpret feature importance and the model’s outputs.</div></div><div><h3>Results</h3><div>The final model, CABG Bleeding Risk of 10 Variables (CABG-BR10), was built using Random Rorest. This model identified 10 key variables: antiplatelet drug discontinuation, N-terminal pro B-type natriuretic peptide, activated partial thromboplastin time, hemoglobin, urea, cardiac troponin T, estimated glomerular filtration rate, total bilirubin, fibrinogen, and international normalized ratio. In the internal and external validation cohorts, the model demonstrated solid performance with Receiver Operating Characteristic − Area Under the Curve values of 0.90 and 0.87, and Precision-Recall − Area Under the Curve values of 0.70 and 0.67, respectively. SHAP analysis identified key predictors of bleeding risk, and an online tool was developed to facilitate bleeding risk assessment.</div></div><div><h3>Conclusion</h3><div>The CABG-BR10 model accurately predicts perioperative bleeding risk in OPCABG patients, outperforming traditional scoring systems and providing interpretable, clinically relevant insights into bleeding risk factors.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106226"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145799858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction and validation of a machine learning-based risk prediction model for invasive mechanical ventilation in AECOPD patients complicated with respiratory failure 基于机器学习的AECOPD合并呼吸衰竭患者有创机械通气风险预测模型构建与验证
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-27 DOI: 10.1016/j.ijmedinf.2025.106244
Xin Jiang , Ji Li , Jingjing Ju , Hao Ding , Sufang Yang

Objective

This study aimed to create and validate a machine learning (ML) model to predict the likelihood of invasive mechanical ventilation (IMV) in patients with acute exacerbation of chronic obstructive pulmonary disease (AECOPD) complicated by respiratory failure.

Methods

Data from patients diagnosed with AECOPD and respiratory failure were retrospectively extracted from the Medical Information Mart for Intensive Care-IV (MIMIC-IV). A total of 551 cases were split 7:3 into a training set (385 cases) for model construction and an internal validation set (166 cases). The IMV served as the outcome event. Features were selected with the Boruta algorithm and least absolute shrinkage and selection operator (LASSO). Eight ML algorithms—XGBoost, decision tree (DT), random forest (RF), support-vector machine (SVM), LightGBM, CatBoost, Gaussian naïve Bayes (NB) and K-nearest neighbor (NN)—were trained with 10-fold cross-validation. Model performance was assessed by the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curve and clinical impact curve. An external validation cohort of 100 AECOPD-respiratory failure patients admitted to Baoying People’s Hospital between January 2020 and August 2025 was collected. The final best model was interpreted with SHapley Additive exPlanations (SHAP) to clarify feature importance and decision logic, and an interactive dynamic nomogram was plotted to increase readability.

Results

Boruta plus LASSO identified total calcium, partial pressure of oxygen (PO2), oxygen saturation (SpO2) and sepsis as significant predictors. XGBoost outperformed the other algorithms, achieving an internal validation accuracy of 72.2 %, sensitivity of 64.6 %, specificity of 79.8 %, F1 score of 69.7 % and AUC of 0.813 (95 % CI 0.748–0.878). The external validation accuracy reached 76.4 %, the sensitivity reached 82.6 %, the specificity reached 70.0 %, the F1 score reached 78.7 %, and the AUC reached 0.840 (95 % CI 0.801–0.879). SHAP analysis further indicated that PO2 and SpO2 were the primary drivers of model decisions. An interactive dynamic nomogram was successfully constructed.

Conclusion

IMV in AECOPD patients with respiratory failure was associated with total calcium, PO2, and SpO2 levels and sepsis. The developed XGBoost model demonstrated good predictive value for IMV in this clinical population.
目的:本研究旨在建立并验证机器学习(ML)模型,以预测慢性阻塞性肺疾病(AECOPD)急性加重期合并呼吸衰竭患者进行有创机械通气(IMV)的可能性。方法:回顾性地从重症监护医学信息市场- iv (MIMIC-IV)中提取诊断为AECOPD和呼吸衰竭的患者的资料。551个案例以7:3的比例分成用于模型构建的训练集(385例)和内部验证集(166例)。国际货币基金组织会议是最后的会议。使用Boruta算法和最小绝对收缩和选择算子(LASSO)选择特征。8种ML算法——xgboost、决策树(DT)、随机森林(RF)、支持向量机(SVM)、LightGBM、CatBoost、高斯naïve贝叶斯(NB)和k近邻(NN)——进行了10倍交叉验证的训练。通过受试者工作特征曲线下面积(AUC)、准确度、灵敏度、特异性、F1评分、校准曲线、决策曲线和临床影响曲线评价模型性能。收集2020年1月至2025年8月在宝应市人民医院住院的100例aecopd -呼吸衰竭患者的外部验证队列。利用SHapley加性解释(SHAP)对最终的最佳模型进行解释,以明确特征重要性和决策逻辑,并绘制交互式动态nomogram以提高可读性。结果:Boruta + LASSO发现总钙、氧分压(PO2)、氧饱和度(SpO2)和脓毒症是显著的预测因素。XGBoost优于其他算法,其内部验证准确率为72.2%,灵敏度为64.6%,特异性为79.8%,F1评分为69.7%,AUC为0.813 (95% CI 0.748 ~ 0.878)。外部验证准确度达76.4%,灵敏度达82.6%,特异性达70.0%,F1评分达78.7%,AUC达0.840 (95% CI 0.801 ~ 0.879)。SHAP分析进一步表明,PO2和SpO2是模型决策的主要驱动因素。成功地构造了一个交互式动态图。结论:AECOPD合并呼吸衰竭患者IMV与总钙、PO2、SpO2水平及脓毒症相关。开发的XGBoost模型在该临床人群中显示出良好的IMV预测价值。
{"title":"Construction and validation of a machine learning-based risk prediction model for invasive mechanical ventilation in AECOPD patients complicated with respiratory failure","authors":"Xin Jiang ,&nbsp;Ji Li ,&nbsp;Jingjing Ju ,&nbsp;Hao Ding ,&nbsp;Sufang Yang","doi":"10.1016/j.ijmedinf.2025.106244","DOIUrl":"10.1016/j.ijmedinf.2025.106244","url":null,"abstract":"<div><h3>Objective</h3><div>This study aimed to create and validate a machine learning (ML) model to predict the likelihood of invasive mechanical ventilation (IMV) in patients with acute exacerbation of chronic obstructive pulmonary disease (AECOPD) complicated by respiratory failure.</div></div><div><h3>Methods</h3><div>Data from patients diagnosed with AECOPD and respiratory failure were retrospectively extracted from the Medical Information Mart for Intensive Care-IV (MIMIC-IV). A total of 551 cases were split 7:3 into a training set (385 cases) for model construction and an internal validation set (166 cases). The IMV served as the outcome event. Features were selected with the Boruta algorithm and least absolute shrinkage and selection operator (LASSO). Eight ML algorithms—XGBoost, decision tree (DT), random forest (RF), support-vector machine (SVM), LightGBM, CatBoost, Gaussian naïve Bayes (NB) and K-nearest neighbor (NN)—were trained with 10-fold cross-validation. Model performance was assessed by the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curve and clinical impact curve. An external validation cohort of 100 AECOPD-respiratory failure patients admitted to Baoying People’s Hospital between January 2020 and August 2025 was collected. The final best model was interpreted with SHapley Additive exPlanations (SHAP) to clarify feature importance and decision logic, and an interactive dynamic nomogram was plotted to increase readability.</div></div><div><h3>Results</h3><div>Boruta plus LASSO identified total calcium, partial pressure of oxygen (PO<sub>2</sub>), oxygen saturation (SpO<sub>2</sub>) and sepsis as significant predictors. XGBoost outperformed the other algorithms, achieving an internal validation accuracy of 72.2<!--> <!-->%, sensitivity of 64.6<!--> <!-->%, specificity of 79.8 %, F1 score of 69.7<!--> <!-->% and AUC of 0.813 (95<!--> <!-->% CI 0.748–0.878). The external validation accuracy reached 76.4<!--> <!-->%, the sensitivity reached 82.6<!--> <!-->%, the specificity reached 70.0<!--> <!-->%, the F1 score reached 78.7<!--> <!-->%, and the AUC reached 0.840 (95<!--> <!-->% CI 0.801–0.879). SHAP analysis further indicated that PO<sub>2</sub> and SpO<sub>2</sub> were the primary drivers of model decisions. An interactive dynamic nomogram was successfully constructed.</div></div><div><h3>Conclusion</h3><div>IMV in AECOPD patients with respiratory failure was associated with total calcium, PO<sub>2</sub>, and SpO<sub>2</sub> levels and sepsis. The developed XGBoost model demonstrated good predictive value for IMV in this clinical population.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106244"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145866463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-based diagnosis of autism spectrum disorder in children and adolescents using eye-tracking data: a systematic review and meta-analysis 基于机器学习的儿童和青少年自闭症谱系障碍的眼动追踪诊断:系统回顾和荟萃分析
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-15 Epub Date: 2025-12-25 DOI: 10.1016/j.ijmedinf.2025.106235
Wenhao Han , Xinyu Yang , Xin Li , Jiacheng Wang , Juan Liu , Wei Pang

Objective

Eye-tracking technology has been increasingly investigated as an objective approach for distinguishing individuals with Autism Spectrum Disorder (ASD) from typically developing (TD) individuals. Artificial intelligence and machine learning (ML) methods have been widely applied to support ASD diagnosis and treatment, and prior studies suggest that ML models leveraging eye-tracking data can achieve high diagnostic accuracy. This systematic review and meta-analysis aimed to evaluate the diagnostic performance of machine-learning models using eye-tracking data to distinguish children and adolescents with ASD from TD peers.

Methods

We systematically searched PubMed, Embase, Web of Science, IEEE Xplore, Scopus, and the Cochrane Library from inception to August 3, 2025. We included studies that applied ML methods to eye-tracking data to distinguish children with ASD from TD children. We extracted data on participant characteristics, model performance, eye-tracking protocols, and machine-learning algorithms. The review protocol was registered in PROSPERO (CRD420251162462).

Results

We identified 1,045 records, of which 25 studies were included in the meta-analysis. The included studies comprised 2,319 participants, with sample sizes ranging from 32 to 529 per study. The pooled accuracy, sensitivity, and specificity of machine-learning models using eye-tracking data to distinguish children with ASD from TD children were 85 % (95 % CI, 81–89 %), 86 % (95 % CI, 82–89 %), and 86 % (95 % CI, 79–91 %), respectively. These results suggest that eye-tracking–based machine-learning approaches have good diagnostic performance for identifying ASD.

Conclusion

Eye-tracking–based machine-learning approaches show considerable potential for distinguishing children with ASD from TD children. However, the robustness and generalizability of these findings are limited by the lack of external validation, small sample sizes, and substantial between-study heterogeneity. To establish generalizability, future research should prioritize standardized eye-tracking paradigms and large-scale, prospective, multicenter study designs with external validation. Such efforts may facilitate the translation of these models into clinical practice as objective and efficient adjunctive screening tools.
眼动追踪技术作为一种区分自闭症谱系障碍(ASD)和正常发育(TD)个体的客观方法,已得到越来越多的研究。人工智能和机器学习(ML)方法已广泛应用于支持ASD的诊断和治疗,先前的研究表明,利用眼动追踪数据的ML模型可以实现较高的诊断准确性。本系统综述和荟萃分析旨在评估使用眼动追踪数据的机器学习模型的诊断性能,以区分自闭症儿童和青少年与TD同龄人。方法系统检索PubMed、Embase、Web of Science、IEEE explore、Scopus、Cochrane Library自成立至2025年8月3日的文献。我们纳入了将ML方法应用于眼动追踪数据以区分ASD儿童和TD儿童的研究。我们提取了参与者特征、模型性能、眼动追踪协议和机器学习算法的数据。该审查方案已在PROSPERO注册(CRD420251162462)。结果我们确定了1045条记录,其中25项研究被纳入meta分析。纳入的研究包括2319名参与者,每项研究的样本量从32到529不等。使用眼动追踪数据的机器学习模型区分ASD儿童和TD儿童的总准确性、灵敏度和特异性分别为85% (95% CI, 81 - 89%)、86% (95% CI, 82 - 89%)和86% (95% CI, 79 - 91%)。这些结果表明,基于眼动追踪的机器学习方法在识别ASD方面具有良好的诊断性能。结论基于眼动追踪的机器学习方法在区分ASD儿童和TD儿童方面具有很大的潜力。然而,这些发现的稳健性和普遍性受到缺乏外部验证、小样本量和大量研究间异质性的限制。为了建立普遍性,未来的研究应优先考虑标准化的眼动追踪范式和具有外部验证的大规模、前瞻性、多中心研究设计。这些努力可能有助于将这些模型转化为临床实践,作为客观有效的辅助筛查工具。
{"title":"Machine learning-based diagnosis of autism spectrum disorder in children and adolescents using eye-tracking data: a systematic review and meta-analysis","authors":"Wenhao Han ,&nbsp;Xinyu Yang ,&nbsp;Xin Li ,&nbsp;Jiacheng Wang ,&nbsp;Juan Liu ,&nbsp;Wei Pang","doi":"10.1016/j.ijmedinf.2025.106235","DOIUrl":"10.1016/j.ijmedinf.2025.106235","url":null,"abstract":"<div><h3>Objective</h3><div>Eye-tracking technology has been increasingly investigated as an objective approach for distinguishing individuals with Autism Spectrum Disorder (ASD) from typically developing (TD) individuals. Artificial intelligence and machine learning (ML) methods have been widely applied to support ASD diagnosis and treatment, and prior studies suggest that ML models leveraging eye-tracking data can achieve high diagnostic accuracy. This systematic review and meta-analysis aimed to evaluate the diagnostic performance of machine-learning models using eye-tracking data to distinguish children and adolescents with ASD from TD peers.</div></div><div><h3>Methods</h3><div>We systematically searched PubMed, Embase, Web of Science, IEEE Xplore, Scopus, and the Cochrane Library from inception to August 3, 2025. We included studies that applied ML methods to eye-tracking data to distinguish children with ASD from TD children. We extracted data on participant characteristics, model performance, eye-tracking protocols, and machine-learning algorithms. The review protocol was registered in PROSPERO (CRD420251162462).</div></div><div><h3>Results</h3><div>We identified 1,045 records, of which 25 studies were included in the meta-analysis. The included studies comprised 2,319 participants, with sample sizes ranging from 32 to 529 per study. The pooled accuracy, sensitivity, and specificity of machine-learning models using eye-tracking data to distinguish children with ASD from TD children were 85 % (95 % CI, 81–89 %), 86 % (95 % CI, 82–89 %), and 86 % (95 % CI, 79–91 %), respectively. These results suggest that eye-tracking–based machine-learning approaches have good diagnostic performance for identifying ASD.</div></div><div><h3>Conclusion</h3><div>Eye-tracking–based machine-learning approaches show considerable potential for distinguishing children with ASD from TD children. However, the robustness and generalizability of these findings are limited by the lack of external validation, small sample sizes, and substantial between-study heterogeneity. To establish generalizability, future research should prioritize standardized eye-tracking paradigms and large-scale, prospective, multicenter study designs with external validation. Such efforts may facilitate the translation of these models into clinical practice as objective and efficient adjunctive screening tools.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106235"},"PeriodicalIF":4.1,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1