Pub Date : 2025-12-23DOI: 10.1016/j.ijmedinf.2025.106232
Petra Hospodková , Jan Bruthans , Adéla Englová
Introduction
The Patient Summary (PS), a standardized subset of the electronic health record is designed to provide essential patient information for use in emergencies, unplanned care, and cross-border healthcare. While its technical development has progressed across Europe, little is known about real-world PS adoption and physician perceptions at the national level. This study explores the awareness, usage, and perceived barriers to the PS adoption among Czech physicians.
Methods
A cross-sectional online survey was distributed to all registered physicians in the Czech Republic between February and March 2025. The questionnaire assessed demographic characteristics, PS usage patterns, perceived benefits and barriers, and alignment with clinical practice. Descriptive statistics were calculated, and non-parametric tests (Wilcoxon rank-sum, Kruskal–Wallis) were used to examine differences by years of experience and medical specialty.
Results
A total of 1,739 responses were received (response rate: 4.14 %). Most respondents (66.4 %) reported not using the PS at all, and 72.1 % were unaware that their electronic medical record could be connected to the National Contact Point for eHealth. Only 1.7 % reported a current connection. There was no significant difference in PS use by years of clinical experience (P = 0.391), but a significant difference was observed across specialties (P < 0.001), with the highest usage reported in intensive care medicine and internal medicine.
Discussion and conclusion
Despite recognized benefits, PS usage remains low in the Czech Republic, largely due to limited awareness and system integration. Targeted policy measures, improved communication, and enhanced digital training are needed to support effective adoption.
{"title":"Physicians’ attitudes toward the patient summary in the Czech Republic: A national cross-sectional survey on awareness, use, and barriers","authors":"Petra Hospodková , Jan Bruthans , Adéla Englová","doi":"10.1016/j.ijmedinf.2025.106232","DOIUrl":"10.1016/j.ijmedinf.2025.106232","url":null,"abstract":"<div><h3>Introduction</h3><div>The Patient Summary (PS), a standardized subset of the electronic health record is designed to provide essential patient information for use in emergencies, unplanned care, and cross-border healthcare. While its technical development has progressed across Europe, little is known about real-world PS adoption and physician perceptions at the national level. This study explores the awareness, usage, and perceived barriers to the PS adoption among Czech physicians.</div></div><div><h3>Methods</h3><div>A cross-sectional online survey was distributed to all registered physicians in the Czech Republic between February and March 2025. The questionnaire assessed demographic characteristics, PS usage patterns, perceived benefits and barriers, and alignment with clinical practice. Descriptive statistics were calculated, and non-parametric tests (Wilcoxon rank-sum, Kruskal–Wallis) were used to examine differences by years of experience and medical specialty.</div></div><div><h3>Results</h3><div>A total of 1,739 responses were received (response rate: 4.14 %). Most respondents (66.4 %) reported not using the PS at all, and 72.1 % were unaware that their electronic medical record could be connected to the National Contact Point for eHealth. Only 1.7 % reported a current connection. There was no significant difference in PS use by years of clinical experience (P = 0.391), but a significant difference was observed across specialties (P < 0.001), with the highest usage reported in intensive care medicine and internal medicine.</div></div><div><h3>Discussion and conclusion</h3><div>Despite recognized benefits, PS usage remains low in the Czech Republic, largely due to limited awareness and system integration. Targeted policy measures, improved communication, and enhanced digital training are needed to support effective adoption.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106232"},"PeriodicalIF":4.1,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1016/j.ijmedinf.2025.106239
Priyadharsini Ramamurthy , Zheng Han , Dursun Delen , Zhuqi Miao , Andrew Gin , Xiao Luo , William Paiva
Background
Traumatic brain injury (TBI) is a major risk factor for neurological disorders, including post-traumatic epilepsy (PTE), a debilitating condition associated with significant long-term consequences. The prognosis of PTE occurrence remains challenging due to the complex pathophysiology of PTE and the impracticality of traditional blood biomarker- or imaging-based screening for large populations. This study proposes a graph-based deep learning approach that leverages electronic health records (EHR) to enhance the predictive assessment of PTE risk.
Methods
We utilized Oracle Real-World Data (ORWD) to construct a Heterogeneous Graph Attention Network (HeteroGAT) that contains patient and diagnosis nodes, with temporal information represented using patients-to-diagnosis edges, and comorbidity connectivity embedded using diagnosis-to-diagnosis edges. The HeteroGAT was trained on a cohort of 1,598,998 TBI-only patients and 102,687 individuals who developed epilepsy after TBI. Model performance was evaluated using sensitivity, specificity, macro F1-score, and area under the receiver operating characteristic curve (AUC-ROC), benchmarked against traditional machine learning models. Attention scores of nodes were used to evaluate node importance. The capabilities of the HeteroGATs trained to differentiate early vs late PTE patients following TBI were also assessed.
Results
HeteroGAT significantly outperformed conventional models in PTE prediction by effectively integrating demographic data and comorbidity profiles spanning from 20 to 500 distinct conditions. The model’s multi-head attention mechanisms, in combination with learned comorbidity connectivity, enhanced its ability to capture complex dependencies within EHR data. HeteroGAT achieved an AUC-ROC of 0.80, outperforming the best-performing traditional model, random forest (AUC-ROC = 0.77). HeteroGAT also demonstrated capabilities in differentiating early and late PTEs. Ranking of nodes based on attention scores also identified predictors of PTE that are clinically relevant.
Conclusion
By modeling sparse EHR data through patient encounter embeddings, HeteroGAT effectively captures temporal and relational patterns in comorbidities critical for PTE prediction. Our findings highlight the potential of graph-based deep learning models, synergized with large-scale EHR data, in advancing personalized risk assessment, ultimately addressing the urgent need for more precise and proactive management of PTE in TBI patients.
{"title":"Graph attention network with comorbidity connectivity embedding for post-traumatic epilepsy risk prediction using sparse time-series electronic health records","authors":"Priyadharsini Ramamurthy , Zheng Han , Dursun Delen , Zhuqi Miao , Andrew Gin , Xiao Luo , William Paiva","doi":"10.1016/j.ijmedinf.2025.106239","DOIUrl":"10.1016/j.ijmedinf.2025.106239","url":null,"abstract":"<div><h3>Background</h3><div>Traumatic brain injury (TBI) is a major risk factor for neurological disorders, including post-traumatic epilepsy (PTE), a debilitating condition associated with significant long-term consequences. The prognosis of PTE occurrence remains challenging due to the complex pathophysiology of PTE and the impracticality of traditional blood biomarker- or imaging-based screening for large populations. This study proposes a graph-based deep learning approach that leverages electronic health records (EHR) to enhance the predictive assessment of PTE risk.</div></div><div><h3>Methods</h3><div>We utilized Oracle Real-World Data (ORWD) to construct a Heterogeneous Graph Attention Network (HeteroGAT) that contains patient and diagnosis nodes, with temporal information represented using patients-to-diagnosis edges, and comorbidity connectivity embedded using diagnosis-to-diagnosis edges. The HeteroGAT was trained on a cohort of 1,598,998 TBI-only patients and 102,687 individuals who developed epilepsy after TBI. Model performance was evaluated using sensitivity, specificity, macro F1-score, and area under the receiver operating characteristic curve (AUC-ROC), benchmarked against traditional machine learning models. Attention scores of nodes were used to evaluate node importance. The capabilities of the HeteroGATs trained to differentiate early vs late PTE patients following TBI were also assessed.</div></div><div><h3>Results</h3><div>HeteroGAT significantly outperformed conventional models in PTE prediction by effectively integrating demographic data and comorbidity profiles spanning from 20 to 500 distinct conditions. The model’s multi-head attention mechanisms, in combination with learned comorbidity connectivity, enhanced its ability to capture complex dependencies within EHR data. HeteroGAT achieved an AUC-ROC of 0.80, outperforming the best-performing traditional model, random forest (AUC-ROC = 0.77). HeteroGAT also demonstrated capabilities in differentiating early and late PTEs. Ranking of nodes based on attention scores also identified predictors of PTE that are clinically relevant.</div></div><div><h3>Conclusion</h3><div>By modeling sparse EHR data through patient encounter embeddings, HeteroGAT effectively captures temporal and relational patterns in comorbidities critical for PTE prediction. Our findings highlight the potential of graph-based deep learning models, synergized with large-scale EHR data, in advancing personalized risk assessment, ultimately addressing the urgent need for more precise and proactive management of PTE in TBI patients.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106239"},"PeriodicalIF":4.1,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1016/j.ijmedinf.2025.106240
Jun Guo , Fan Xiong , Baisheng Sun , Mingxing Lei , Yong Qin
Background
Sepsis represents a life-threatening complication in severe orthopedic trauma, significantly increasing short-term mortality risk. Despite the clinical urgency for early prognosis assessment, current predictive tools remain inadequate. To address this gap, this study used a machine learning (ML)-based framework for mortality risk stratification in this high-risk population.
Methods
This retrospective cohort study established ML models to predict 30-day all-cause mortality in critically ill patients with orthopedic trauma and sepsis. Data from 2,060 eligible patients were extracted from the intensive care unit (ICU) of Beth Israel Deaconess Medical Center (2008–2019) in the United State and randomly split into training (80 %) and internal validation (20 %) sets. After handling missing data and addressing class imbalance, seven ML algorithms (including CatBoost [Categorical Boosting], RF [Random Forest], and SVM [Support Vector Machine]) were trained and optimized using 10-fold cross-validation. Model performance was assessed based on discrimination (AUC [Area Under the Curve], accuracy, F1-score), calibration (Brier score, calibration slope), and clinical utility. The top-performing models were further validated on an independent external Chinese cohort (n = 273, 2020–2024).
Results
The study cohort had a mean age of 62.8 years and a 30-day mortality rate of 19.9 % (410/2060). Non-survivors were significantly older, had a higher comorbidity burden, and more severe physiological derangements. The LASSO analysis identified 16 prognostic variables, with age, hematologic parameters (RDW, WBC), SOFA scores, hemodynamic measures (SBP), and antihypertensive therapy emerging as significant predictors. Among all models, the CatBoost algorithm demonstrated superior performance in the internal validation set, achieving the highest AUC (0.955), accuracy (0.884), and F1-score (0.878), along with excellent calibration (Brier score: 0.081). A soft voting ensemble model, integrating the top three algorithms (CatBoost, RF, SVM), was subsequently constructed. In external validation, this ensemble model generalized robustly, maintaining strong discrimination (AUC: 0.842, Accuracy: 0.737) and calibration (Brier score: 0.173), outperforming the standalone CatBoost model. SHapley Additive exPlanations analysis provided interpretable, individualized risk assessments.
Conclusions
This study trains, optimizes, and evaluates a high-performing ML-based prediction model for 30-day mortality in patients with critical orthopedic trauma and sepsis. The CatBoost model and the soft voting ensemble, particularly the latter, demonstrates strong generalizability and clinical utility, offering a potential tool for early risk stratification and personalized management in this vulnerable population.
脓毒症是严重骨科创伤中一种危及生命的并发症,显著增加短期死亡风险。尽管临床迫切需要早期预后评估,但目前的预测工具仍然不足。为了解决这一差距,本研究在这一高危人群中使用了基于机器学习(ML)的死亡率风险分层框架。方法回顾性队列研究建立ML模型,预测骨科创伤合并脓毒症危重患者30天全因死亡率。从美国贝斯以色列女执事医疗中心(Beth Israel Deaconess Medical Center)重症监护室(ICU)提取2060例符合条件的患者数据(2008-2019),随机分为训练组(80%)和内部验证组(20%)。在处理缺失数据和解决类不平衡问题后,使用10倍交叉验证对七种ML算法(包括CatBoost [Categorical Boosting], RF [Random Forest]和SVM [Support Vector Machine])进行了训练和优化。模型性能评估基于鉴别(AUC[曲线下面积],准确性,f1评分),校准(Brier评分,校准斜率)和临床实用性。在一个独立的外部中国队列(n = 273, 2020-2024)上进一步验证了表现最好的模型。结果研究队列的平均年龄为62.8岁,30天死亡率为19.9%(410/2060)。非幸存者明显更老,有更高的合并症负担,更严重的生理紊乱。LASSO分析确定了16个预后变量,其中年龄、血液学参数(RDW、WBC)、SOFA评分、血流动力学测量(SBP)和抗高血压治疗成为重要的预测因素。在所有模型中,CatBoost算法在内部验证集中表现优异,AUC(0.955)、准确率(0.884)和f1评分(0.878)最高,校准效果也很好(Brier评分:0.081)。随后构建了一个软投票集成模型,该模型集成了前三种算法(CatBoost、RF、SVM)。在外部验证中,该集成模型具有鲁棒性泛化,保持了较强的判别性(AUC: 0.842,准确度:0.737)和校准性(Brier评分:0.173),优于独立的CatBoost模型。SHapley加性解释分析提供了可解释的、个性化的风险评估。本研究训练、优化并评估了一种高性能的基于ml的骨科创伤和脓毒症患者30天死亡率预测模型。CatBoost模型和软投票集合,特别是后者,显示出很强的通用性和临床实用性,为这一弱势群体的早期风险分层和个性化管理提供了潜在的工具。
{"title":"Ensemble machine learning for early mortality risk stratification in septic orthopedic trauma: an international cohort study","authors":"Jun Guo , Fan Xiong , Baisheng Sun , Mingxing Lei , Yong Qin","doi":"10.1016/j.ijmedinf.2025.106240","DOIUrl":"10.1016/j.ijmedinf.2025.106240","url":null,"abstract":"<div><h3>Background</h3><div>Sepsis represents a life-threatening complication in severe orthopedic trauma, significantly increasing short-term mortality risk. Despite the clinical urgency for early prognosis assessment, current predictive tools remain inadequate. To address this gap, this study used a machine learning (ML)-based framework for mortality risk stratification in this high-risk population.</div></div><div><h3>Methods</h3><div>This retrospective cohort study established ML models to predict 30-day all-cause mortality in critically ill patients with orthopedic trauma and sepsis. Data from 2,060 eligible patients were extracted from the intensive care unit (ICU) of Beth Israel Deaconess Medical Center (2008–2019) in the United State and randomly split into training (80 %) and internal validation (20 %) sets. After handling missing data and addressing class imbalance, seven ML algorithms (including CatBoost [Categorical Boosting], RF [Random Forest], and SVM [Support Vector Machine]) were trained and optimized using 10-fold cross-validation. Model performance was assessed based on discrimination (AUC [Area Under the Curve], accuracy, F1-score), calibration (Brier score, calibration slope), and clinical utility. The top-performing models were further validated on an independent external Chinese cohort (n = 273, 2020–2024).</div></div><div><h3>Results</h3><div>The study cohort had a mean age of 62.8 years and a 30-day mortality rate of 19.9 % (410/2060). Non-survivors were significantly older, had a higher comorbidity burden, and more severe physiological derangements. The LASSO analysis identified 16 prognostic variables, with age, hematologic parameters (RDW, WBC), SOFA scores, hemodynamic measures (SBP), and antihypertensive therapy emerging as significant predictors. Among all models, the CatBoost algorithm demonstrated superior performance in the internal validation set, achieving the highest AUC (0.955), accuracy (0.884), and F1-score (0.878), along with excellent calibration (Brier score: 0.081). A soft voting ensemble model, integrating the top three algorithms (CatBoost, RF, SVM), was subsequently constructed. In external validation, this ensemble model generalized robustly, maintaining strong discrimination (AUC: 0.842, Accuracy: 0.737) and calibration (Brier score: 0.173), outperforming the standalone CatBoost model. SHapley Additive exPlanations analysis provided interpretable, individualized risk assessments.</div></div><div><h3>Conclusions</h3><div>This study trains, optimizes, and evaluates a high-performing ML-based prediction model for 30-day mortality in patients with critical orthopedic trauma and sepsis. The CatBoost model and the soft voting ensemble, particularly the latter, demonstrates strong generalizability and clinical utility, offering a potential tool for early risk stratification and personalized management in this vulnerable population.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106240"},"PeriodicalIF":4.1,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.ijmedinf.2025.106234
Zheqing Li , Liyang Tang , Yin Li , Yuanyuan Dang , Lin Yao
Context
Internet hospitals have emerged as a digital innovation in healthcare, optimizing resource allocation and enhancing patient experience. They also support hierarchical diagnosis and treatment and contribute to the Healthy China initiative.
Objectives
To establish a comprehensive evaluation system to promote the sustainable development of Internet hospitals.
Methods
A systematic review of literature related to the evaluation of Internet-based healthcare services was conducted. Using Web of Science and CNKI as data sources, studies published between 2015 and 2024 were screened based on predefined criteria, focusing on high-quality journals and research reports. The selected literature was coded and analyzed across four dimensions: patient services, doctor services, management services, and information security.
Results
The final analysis included 34 papers, with 25 mentioning patient services indicators, 20 mentioning doctor services indicators, 18 mentioning medical services process management indicators, and 9 mentioning information security. This study identifies key evaluation indicators and examines their interrelationships, highlighting potential systemic risks from localized optimizations.
Conclusion
This review analyzed Internet hospital evaluation across patient services, doctor services, services management, and information security. While it highlights potential efficiency gains, it notes the lack of comprehensive indicators, limiting assessment and improvement. For sustainable development, a more comprehensive evaluation system should integrate multi-stakeholder perspectives (patients, doctors, institutions), address systemic risks from localized optimization, and incorporate coordinated policy considerations.
背景:互联网医院作为医疗领域的数字化创新,优化了资源配置,提升了患者体验。他们还支持分级诊疗,为“健康中国”倡议做出贡献。目的:建立促进互联网医院可持续发展的综合评价体系。方法:系统回顾与互联网医疗服务评价相关的文献。以Web of Science和CNKI为数据来源,根据预先设定的标准筛选2015 - 2024年间发表的研究,重点筛选高质量的期刊和研究报告。对选定的文献进行编码,并从四个方面进行分析:患者服务、医生服务、管理服务和信息安全。结果:最终分析共纳入34篇论文,其中患者服务指标25篇,医生服务指标20篇,医疗服务流程管理指标18篇,信息安全9篇。本研究确定了关键的评估指标,并检查了它们之间的相互关系,突出了局部优化带来的潜在系统性风险。结论:本综述分析了互联网医院在患者服务、医生服务、服务管理和信息安全方面的评价。虽然它强调了潜在的效率提高,但它指出缺乏全面的指标,限制了评估和改进。为了实现可持续发展,更全面的评价体系应该整合多方利益相关者(患者、医生、机构)的视角,从局部优化中解决系统性风险,并纳入协调一致的政策考虑。
{"title":"A review of evaluation system for Internet hospitals","authors":"Zheqing Li , Liyang Tang , Yin Li , Yuanyuan Dang , Lin Yao","doi":"10.1016/j.ijmedinf.2025.106234","DOIUrl":"10.1016/j.ijmedinf.2025.106234","url":null,"abstract":"<div><h3>Context</h3><div>Internet hospitals have emerged as a digital innovation in healthcare, optimizing resource allocation and enhancing patient experience. They also support hierarchical diagnosis and treatment and contribute to the Healthy China initiative.</div></div><div><h3>Objectives</h3><div>To establish a comprehensive evaluation system to promote the sustainable development of Internet hospitals.</div></div><div><h3>Methods</h3><div>A systematic review of literature related to the evaluation of Internet-based healthcare services was conducted. Using Web of Science and CNKI as data sources, studies published between 2015 and 2024 were screened based on predefined criteria, focusing on high-quality journals and research reports. The selected literature was coded and analyzed across four dimensions: patient services, doctor services, management services, and information security.</div></div><div><h3>Results</h3><div>The final analysis included 34 papers, with 25 mentioning patient services indicators, 20 mentioning doctor services indicators, 18 mentioning medical services process management indicators, and 9 mentioning information security. This study identifies key evaluation indicators and examines their interrelationships, highlighting potential systemic risks from localized optimizations.</div></div><div><h3>Conclusion</h3><div>This review analyzed Internet hospital evaluation across patient services, doctor services, services management, and information security. While it highlights potential efficiency gains, it notes the lack of comprehensive indicators, limiting assessment and improvement. For sustainable development, a more comprehensive evaluation system should integrate multi-stakeholder perspectives (patients, doctors, institutions), address systemic risks from localized optimization, and incorporate coordinated policy considerations.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106234"},"PeriodicalIF":4.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The release of ChatGPT has spurred the widespread adoption of generative large language models (LLMs) in healthcare. This scoping review systematically examines their use in healthcare.
Methods
A systematic search was conducted using PubMed, a comprehensive and representative database on biomedical and health science, to identify studies published between January 1, 2023, and July 30, 2024. Studies were included if they assessed the performance of generative LLMs in healthcare applications; review or perspective articles were excluded.
Results
A total of 415 studies were included, with a significant increase in publications observed after April 2023. Generative LLMs were applied across various medical specialties, primarily supporting clinical decision-making (26.7%) and providing patient information (23.9%). Smaller proportions were focused on professional education and training (18.1%), research (16.1%), and workflow support (12.5%). These applications were mainly supported by three key NLP tasks: question answering (36.1%), text classification (27.5%), and text generation (26.3%). Public datasets appeared in 20% of studies, and 15% used clinical patient data. Of the 98 LLMs used, GPT-4 (51.3%), GPT-3.5 (36.6%), and ChatGPT (22.4%) were the most common. Direct prompting was the most common adaptation method (92.5%), with reinforcement learning rarely utilized (1.4%). Accuracy was the most frequently assessed metric, while errors and safety (9.4%) and time efficiency (7.0%) were less commonly evaluated.
Conclusion
LLMs hold promise across healthcare applications. Expanding their use in workflow optimization, trainee education, and research tools could enhance healthcare delivery and innovation. Comprehensive evaluation using standardized criteria is essential for LLMs integration into healthcare.
{"title":"Advancing healthcare with large language models: A scoping review of applications and future directions","authors":"Zhihong Zhang , Mohamad Javad Momeni Nezhad , Seyed Mohammad Bagher Hosseini , Ali Zolnour , Zahra Zonour , Seyedeh Mahdis Hosseini , Maxim Topaz , Maryam Zolnoori","doi":"10.1016/j.ijmedinf.2025.106231","DOIUrl":"10.1016/j.ijmedinf.2025.106231","url":null,"abstract":"<div><h3>Background</h3><div>The release of ChatGPT has spurred the widespread adoption of generative large language models (LLMs) in healthcare. This scoping review systematically examines their use in healthcare.</div></div><div><h3>Methods</h3><div>A systematic search was conducted using PubMed, a comprehensive and representative database on biomedical and health science, to identify studies published between January 1, 2023, and July 30, 2024. Studies were included if they assessed the performance of generative LLMs in healthcare applications; review or perspective articles were excluded.</div></div><div><h3>Results</h3><div>A total of 415 studies were included, with a significant increase in publications observed after April 2023. Generative LLMs were applied across various medical specialties, primarily supporting clinical decision-making (26.7%) and providing patient information (23.9%). Smaller proportions were focused on professional education and training (18.1%), research (16.1%), and workflow support (12.5%). These applications were mainly supported by three key NLP tasks: question answering (36.1%), text classification (27.5%), and text generation (26.3%). Public datasets appeared in 20% of studies, and 15% used clinical patient data. Of the 98 LLMs used, GPT-4 (51.3%), GPT-3.5 (36.6%), and ChatGPT (22.4%) were the most common. Direct prompting was the most common adaptation method (92.5%), with reinforcement learning rarely utilized (1.4%). Accuracy was the most frequently assessed metric, while errors and safety (9.4%) and time efficiency (7.0%) were less commonly evaluated.</div></div><div><h3>Conclusion</h3><div>LLMs hold promise across healthcare applications. Expanding their use in workflow optimization, trainee education, and research tools could enhance healthcare delivery and innovation. Comprehensive evaluation using standardized criteria is essential for LLMs integration into healthcare.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106231"},"PeriodicalIF":4.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-21DOI: 10.1016/j.ijmedinf.2025.106233
João Brainer Clares de Andrade , Thiago S. Carneiro , George N. Nunes Mendes , Joao Pedro Nardari dos Santos , Jussie Correia Lima
Background and purpose
The rapid integration of artificial intelligence (AI) into stroke care has outpaced many clinicians’ ability to critically evaluate and safely implement these tools. We conducted a systematized literature review and developed a practical framework to guide neurologists in the responsible integration of AI into stroke practice.
Methods
We performed a systematized review of PubMed, EMBASE, and gray literature (January 2018-June 2025) following adapted PRISMA guidelines. Search strategies combined AI-related terms with stroke care concepts. We assessed risk of bias using QUADAS-2, RoB 2, and ROBINS-I tools. Expert consultation with stroke neurologists and AI developers informed framework development.
Results
From 8,635 identified records, 152 studies met inclusion criteria (47 in quantitative synthesis). AI applications spanned large vessel occlusion detection (30 %), ASPECTS scoring (21 %), outcome prediction (18 %), hemorrhage detection (15 %), and treatment selection (16 %). Only 23% of studies showed low risk of bias, with main concerns including selection bias (29 %), confounding (38 %), and limited external validation (8 % prospective validation). The Clinical-AI Correlation Framework emphasizes three pillars: (1) problem identification and tool selection, (2) clinical correlation using Bayesian reasoning and topographic pattern recognition, and (3) continuous feedback and quality improvement.
Conclusions
Safe AI integration in stroke care requires structured clinical correlation, robust governance frameworks, and continuous monitoring. Our framework provides practical guidance for maintaining clinical judgment while leveraging AI capabilities, emphasizing human oversight for high-risk decisions and systematic documentation of AI-clinician interactions.
{"title":"A clinical-AI correlation for integrating artificial intelligence into stroke care: a systematized literature review and practice framework","authors":"João Brainer Clares de Andrade , Thiago S. Carneiro , George N. Nunes Mendes , Joao Pedro Nardari dos Santos , Jussie Correia Lima","doi":"10.1016/j.ijmedinf.2025.106233","DOIUrl":"10.1016/j.ijmedinf.2025.106233","url":null,"abstract":"<div><h3>Background and purpose</h3><div>The rapid integration of artificial intelligence (AI) into stroke care has outpaced many clinicians’ ability to critically evaluate and safely implement these tools. We conducted a systematized literature review and developed a practical framework to guide neurologists in the responsible integration of AI into stroke practice.</div></div><div><h3>Methods</h3><div>We performed a systematized review of PubMed, EMBASE, and gray literature (January 2018-June 2025) following adapted PRISMA guidelines. Search strategies combined AI-related terms with stroke care concepts. We assessed risk of bias using QUADAS-2, RoB 2, and ROBINS-I tools. Expert consultation with stroke neurologists and AI developers informed framework development.</div></div><div><h3>Results</h3><div>From 8,635 identified records, 152 studies met inclusion criteria (47 in quantitative synthesis). AI applications spanned large vessel occlusion detection (30 %), ASPECTS scoring (21 %), outcome prediction (18 %), hemorrhage detection (15 %), and treatment selection (16 %). Only 23% of studies showed low risk of bias, with main concerns including selection bias (29 %), confounding (38 %), and limited external validation (8 % prospective validation). The Clinical-AI Correlation Framework emphasizes three pillars: (1) problem identification and tool selection, (2) clinical correlation using Bayesian reasoning and topographic pattern recognition, and (3) continuous feedback and quality improvement.</div></div><div><h3>Conclusions</h3><div>Safe AI integration in stroke care requires structured clinical correlation, robust governance frameworks, and continuous monitoring. Our framework provides practical guidance for maintaining clinical judgment while leveraging AI capabilities, emphasizing human oversight for high-risk decisions and systematic documentation of AI-clinician interactions.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106233"},"PeriodicalIF":4.1,"publicationDate":"2025-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.ijmedinf.2025.106225
Pedro Faustini , Annabelle McIver , Ryan Sullivan , Mark Dras
Background: The digitisation of healthcare has generated vast amounts of data in various formats, including free-text notes, tabular records and medical images. This data is critical for research and innovation, but often contains sensitive information that must be de-identified to ensure patient privacy and regulatory compliance. Natural Language Processing (NLP) enables automated de-identification of sensitive information to safely share medical datasets.
Objective: This study aims to systematically review the literature on NLP-based de-identification techniques applied to free-text medical reports, tabular data, and burned-in text within medical images over the past decade. It seeks to identify state-of-the-art methods, analyse how de-identification tasks are assessed, and find existing gaps for future research.
Methods: We systematically searched five important databases (PubMed, Web of Science, DBLP, ACM and IEEE) for articles published from January 2015 to December 2024 (10 years) about de-identification of medical data in free text, tabular data and burned-in pixels in images. We filtered the articles based on their titles and abstracts against inclusion and exclusion criteria, followed by a quality filter.
Results: From a set of 734 papers, 83 articles were deemed relevant. Most studies de-identify free text, with a few working with tabular data and a much scarcer number dealing with text embedded in the pixels of the images.
Conclusions: De-identification techniques have evolved, with increased use of Language Models and a decline in recurrence-based neural networks. Off-the-shelf tools often require customisation for optimal performance. Most studies de-identify English content, supported by the prevalence of English datasets. Key challenges include the phenomenon of code-mixing (i.e., more than one language used in the same sentence) and the scarcity of available datasets for reproducibility.
背景:医疗保健的数字化产生了各种格式的大量数据,包括自由文本注释、表格记录和医学图像。这些数据对研究和创新至关重要,但通常包含必须去识别的敏感信息,以确保患者隐私和法规遵从性。自然语言处理(NLP)实现了敏感信息的自动去识别,以安全地共享医疗数据集。目的:本研究旨在系统回顾过去十年来基于nlp的去识别技术在医学图像中应用于自由文本医学报告、表格数据和烧入文本的文献。它试图确定最先进的方法,分析如何评估去识别任务,并为未来的研究找到现有的差距。方法:系统检索5个重要数据库(PubMed、Web of Science、DBLP、ACM和IEEE),检索2015年1月至2024年12月(10年)发表的关于自由文本、表格数据和图像中烧毁像素的医疗数据去识别的文章。我们根据标题和摘要对文章进行筛选,然后进行质量筛选。结果:在734篇论文中,83篇文章被认为是相关的。大多数研究都去识别自由文本,只有少数研究处理表格数据,而处理嵌入图像像素中的文本的研究则少得多。结论:随着语言模型的使用增加和基于递归的神经网络的减少,去识别技术已经发展。现成的工具通常需要定制以获得最佳性能。由于英语数据集的普及,大多数研究都去识别英语内容。主要的挑战包括代码混合现象(即,在同一个句子中使用多种语言)和缺乏可用于再现性的可用数据集。
{"title":"De-identification of clinical data: A systematic review of free text, image and tabular data approaches","authors":"Pedro Faustini , Annabelle McIver , Ryan Sullivan , Mark Dras","doi":"10.1016/j.ijmedinf.2025.106225","DOIUrl":"10.1016/j.ijmedinf.2025.106225","url":null,"abstract":"<div><div><em>Background:</em> The digitisation of healthcare has generated vast amounts of data in various formats, including free-text notes, tabular records and medical images. This data is critical for research and innovation, but often contains sensitive information that must be de-identified to ensure patient privacy and regulatory compliance. Natural Language Processing (NLP) enables automated de-identification of sensitive information to safely share medical datasets.</div><div><em>Objective:</em> This study aims to systematically review the literature on NLP-based de-identification techniques applied to free-text medical reports, tabular data, and burned-in text within medical images over the past decade. It seeks to identify state-of-the-art methods, analyse how de-identification tasks are assessed, and find existing gaps for future research.</div><div><em>Methods:</em> We systematically searched five important databases (PubMed, Web of Science, DBLP, ACM and IEEE) for articles published from January 2015 to December 2024 (10 years) about de-identification of medical data in free text, tabular data and burned-in pixels in images. We filtered the articles based on their titles and abstracts against inclusion and exclusion criteria, followed by a quality filter.</div><div><em>Results:</em> From a set of 734 papers, 83 articles were deemed relevant. Most studies de-identify free text, with a few working with tabular data and a much scarcer number dealing with text embedded in the pixels of the images.</div><div><em>Conclusions:</em> De-identification techniques have evolved, with increased use of Language Models and a decline in recurrence-based neural networks. Off-the-shelf tools often require customisation for optimal performance. Most studies de-identify English content, supported by the prevalence of English datasets. Key challenges include the phenomenon of code-mixing (i.e., more than one language used in the same sentence) and the scarcity of available datasets for reproducibility.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106225"},"PeriodicalIF":4.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145841724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.ijmedinf.2025.106229
David B. Olawade , Osazuwa Ighodaro , Emmanuel Oghenetejiri Erhieyovwe , Nebere Elias Hankamo , Ismail Tajudeen Hamza , Claret Chinenyenwa Analikwu
Background
Emergency care is operationally defined as time-critical acute care across pre-hospital services, emergency departments, and critical care units (excluding routine urgent care and elective admissions), demanding rapid decision-making under pressure. Digital twin technology, creating real-time virtual replicas through continuous data integration, represents a transformative shift in managing acute conditions, resource allocation, and outcome prediction in emergency medicine.
Aim
This review examines the current applications, benefits, challenges, and future directions of digital twin technology in emergency care and medicine, highlighting its potential to revolutionise emergency healthcare delivery.
Method
A comprehensive narrative literature review was conducted using PubMed, IEEE Xplore, Scopus, and Web of Science databases. Studies published between January 2015 and June 2025 focusing on digital twin applications in emergency departments, trauma care, critical care, and prehospital emergency services were included. Grey literature, conference proceedings, and technical reports were also reviewed to capture emerging developments.
Results
Digital twins demonstrate significant utility across multiple emergency care domains including patient monitoring, resource allocation, workflow optimisation, predictive analytics, and training simulations. Key applications include real time patient condition prediction, emergency department capacity management, trauma response coordination, and personalised treatment planning. Despite promising outcomes, implementation challenges persist, including data integration complexities, computational requirements, and regulatory considerations.
Conclusion
Digital twin technology holds substantial promise for enhancing emergency care delivery through improved decision support, resource optimisation, and predictive capabilities. Continued research, standardisation efforts, and interdisciplinary collaboration are essential for successful clinical integration and widespread adoption.
急诊护理在操作上被定义为院前服务、急诊科和重症监护病房(不包括常规急诊护理和选择性住院)的时间紧迫的急性护理,要求在压力下快速决策。数字孪生技术通过持续的数据集成创建实时虚拟副本,代表了急诊医学在急症管理、资源分配和结果预测方面的革命性转变。本综述探讨了数字孪生技术在急诊护理和医学中的当前应用、益处、挑战和未来方向,强调了其革命性的急诊医疗服务的潜力。方法采用PubMed、IEEE explore、Scopus、Web of Science等数据库进行综合叙述性文献综述。2015年1月至2025年6月期间发表的研究重点是数字双胞胎在急诊科、创伤护理、重症监护和院前急救服务中的应用。还审查了灰色文献、会议记录和技术报告,以捕捉新的发展。结果数字孪生在多个急诊护理领域展示了重要的实用性,包括患者监测、资源分配、工作流程优化、预测分析和培训模拟。主要应用包括实时患者病情预测、急诊科能力管理、创伤反应协调和个性化治疗计划。尽管取得了可喜的成果,但实施方面的挑战依然存在,包括数据集成的复杂性、计算需求和监管方面的考虑。结论数字孪生技术通过改进决策支持、资源优化和预测能力,在加强急诊护理服务方面具有重要前景。持续的研究、标准化工作和跨学科合作对于成功的临床整合和广泛采用至关重要。
{"title":"The role of digital twin technology in modern emergency care","authors":"David B. Olawade , Osazuwa Ighodaro , Emmanuel Oghenetejiri Erhieyovwe , Nebere Elias Hankamo , Ismail Tajudeen Hamza , Claret Chinenyenwa Analikwu","doi":"10.1016/j.ijmedinf.2025.106229","DOIUrl":"10.1016/j.ijmedinf.2025.106229","url":null,"abstract":"<div><h3>Background</h3><div>Emergency care is operationally defined as time-critical acute care across pre-hospital services, emergency departments, and critical care units (excluding routine urgent care and elective admissions), demanding rapid decision-making under pressure. Digital twin technology, creating real-time virtual replicas through continuous data integration, represents a transformative shift in managing acute conditions, resource allocation, and outcome prediction in emergency medicine.</div></div><div><h3>Aim</h3><div>This review examines the current applications, benefits, challenges, and future directions of digital twin technology in emergency care and medicine, highlighting its potential to revolutionise emergency healthcare delivery.</div></div><div><h3>Method</h3><div>A comprehensive narrative literature review was conducted using PubMed, IEEE Xplore, Scopus, and Web of Science databases. Studies published between January 2015 and June 2025 focusing on digital twin applications in emergency departments, trauma care, critical care, and prehospital emergency services were included. Grey literature, conference proceedings, and technical reports were also reviewed to capture emerging developments.</div></div><div><h3>Results</h3><div>Digital twins demonstrate significant utility across multiple emergency care domains including patient monitoring, resource allocation, workflow optimisation, predictive analytics, and training simulations. Key applications include real time patient condition prediction, emergency department capacity management, trauma response coordination, and personalised treatment planning. Despite promising outcomes, implementation challenges persist, including data integration complexities, computational requirements, and regulatory considerations.</div></div><div><h3>Conclusion</h3><div>Digital twin technology holds substantial promise for enhancing emergency care delivery through improved decision support, resource optimisation, and predictive capabilities. Continued research, standardisation efforts, and interdisciplinary collaboration are essential for successful clinical integration and widespread adoption.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106229"},"PeriodicalIF":4.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145799857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.ijmedinf.2025.106228
Siyi Liu , Zekai Yu
{"title":"Comment on “Medication-based mortality prediction in COPD using machine learning and conventional statistical methods”","authors":"Siyi Liu , Zekai Yu","doi":"10.1016/j.ijmedinf.2025.106228","DOIUrl":"10.1016/j.ijmedinf.2025.106228","url":null,"abstract":"","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106228"},"PeriodicalIF":4.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145799805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.ijmedinf.2025.106230
Emiko Shinohara, Yoshimasa Kawazoe
Background
Named entity recognition (NER) is critical in natural language processing (NLP), particularly in the medical field, where accurate identification of entities, such as patient information and clinical events, is essential. Traditional NER approaches rely heavily on large, annotated corpora, which are resource intensive. Large language models (LLMs) offer new NER approaches, particularly through in-context and few-shot learning.
Objective
This study investigates the effects of incorporating annotation guidelines into prompts for NER via LLMs, with a specific focus on their impact on few-shot learning performance across various medical corpora.
Methods
We designed eight different prompt patterns, combining few-shot examples with annotation guidelines of varying complexity, and evaluated their performance via three prominent LLMs: GPT-4o, Claude 3.5 Sonnet, and gpt-oss-120b. Additionally, we employed three diverse medical corpora: i2b2-2014, i2b2-2012, and MedTxt-CR. Accuracy was assessed via precision, recall, and the F1 score, with evaluation methods aligned with those used in relevant shared tasks to ensure the comparability of the results.
Results
Our findings indicate that adding detailed annotation guidelines to few-shot prompts improves the recall and F1 score in most cases.
Conclusion
Including annotation guidelines in prompts enhances the performance of LLMs in NER tasks, making this a practical approach for developing accurate NLP systems in resource-constrained environments. Although annotation guidelines are essential for evaluation and example creation, their integration into LLM prompts can further optimize few-shot learning, especially within specialized domains such as medical NLP.
{"title":"Efficient medical NER with limited data: Enhancing LLM performance through annotation guidelines","authors":"Emiko Shinohara, Yoshimasa Kawazoe","doi":"10.1016/j.ijmedinf.2025.106230","DOIUrl":"10.1016/j.ijmedinf.2025.106230","url":null,"abstract":"<div><h3>Background</h3><div>Named entity recognition (NER) is critical in natural language processing (NLP), particularly in the medical field, where accurate identification of entities, such as patient information and clinical events, is essential. Traditional NER approaches rely heavily on large, annotated corpora, which are resource intensive. Large language models (LLMs) offer new NER approaches, particularly through in-context and few-shot learning.</div></div><div><h3>Objective</h3><div>This study investigates the effects of incorporating annotation guidelines into prompts for NER via LLMs, with a specific focus on their impact on few-shot learning performance across various medical corpora.</div></div><div><h3>Methods</h3><div>We designed eight different prompt patterns, combining few-shot examples with annotation guidelines of varying complexity, and evaluated their performance via three prominent LLMs: GPT-4o, Claude 3.5 Sonnet, and gpt-oss-120b. Additionally, we employed three diverse medical corpora: i2b2-2014, i2b2-2012, and MedTxt-CR. Accuracy was assessed via precision, recall, and the F1 score, with evaluation methods aligned with those used in relevant shared tasks to ensure the comparability of the results.</div></div><div><h3>Results</h3><div>Our findings indicate that adding detailed annotation guidelines to few-shot prompts improves the recall and F1 score in most cases.</div></div><div><h3>Conclusion</h3><div>Including annotation guidelines in prompts enhances the performance of LLMs in NER tasks, making this a practical approach for developing accurate NLP systems in resource-constrained environments. Although annotation guidelines are essential for evaluation and example creation, their integration into LLM prompts can further optimize few-shot learning, especially within specialized domains such as medical NLP.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"208 ","pages":"Article 106230"},"PeriodicalIF":4.1,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}