International Journal of Medical Informatics最新文献_第6页

Association between the platelet to white blood cell ratio and short term mortality in critically ill patients with atherosclerotic cardiovascular disease: A retrospective study and machine learning with external validation 动脉粥样硬化性心血管疾病危重患者血小板/白细胞比率与短期死亡率之间的关系：一项回顾性研究和外部验证的机器学习

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-06 DOI: 10.1016/j.ijmedinf.2026.106267

Zhantao Cao , Hewei Qin , Yue Niu , Zilu Zhang , Guoju Dong

Background

The platelet to white blood cell ratio (PWR) has shown prognostic value in many diseases. Yet its predictive utility for patients with atherosclerotic cardiovascular disease (ASCVD) who receive care in the intensive care unit (ICU) remains uncertain. We examined whether PWR at ICU admission is associated with short term all cause mortality among ICU patients with ASCVD.

Methods

We used the MIMIC IV and eICU databases to study the association between PWR and 30 day all cause mortality in critically ill patients with ASCVD. Patients were grouped by PWR quartiles. Collinearity was checked with the variance inflation factor (VIF). Nonlinearity was assessed with restricted cubic splines(RCS). Survival was compared with Kaplan Meier(KM) curves and the log rank test. Hazard ratios were estimated with stratified and adjusted Cox models. We also built machine learning models that included PWR and clinical features selected with the Boruta algorithm to predict 30 day mortality. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC). External validation in the eICU database was used to assess generalizability.

Results

A total of 10 943 ICU patients with ASCVD were included, 62 % were men, and the median age was 71 years. The highest PWR quartile had a lower 30 day all cause mortality than the lowest quartile, 11 % versus 15 % (p < 0.001). In multivariable Cox models the highest quartile had a lower risk (HR 0.80, 95 % CI 0.67 to 0.95, p = 0.012). RCS suggested a nonlinear association. Age modified the association, with a stronger protective effect in patients younger than 70 years (HR 0.64, 95 % CI 0.47 to 0.87, interaction p < 0.001). The best machine learning model achieved an AUC of 0.812 in internal validation and 0.80 in external validation. SHAP analysis showed that higher PWR was linked to a lower predicted risk of death.

Conclusion

PWR independently predicts 30 day all cause mortality in ICU patients with ASCVD. These findings support the use of PWR for risk stratification and to inform management in critical care for ASCVD.

背景：血小板与白细胞比值（PWR）在许多疾病中显示出预后价值。然而，它对在重症监护病房（ICU）接受治疗的动脉粥样硬化性心血管疾病（ASCVD）患者的预测效用仍不确定。我们研究了ICU患者入院时的PWR是否与ASCVD患者的短期全因死亡率相关。方法：我们使用MIMIC IV和eICU数据库研究重症ASCVD患者PWR与30天全因死亡率之间的关系。患者按PWR四分位数分组。用方差膨胀因子（VIF）检验共线性。用受限三次样条（RCS）评价非线性。生存率比较采用Kaplan Meier（KM）曲线和log rank检验。采用分层和调整后的Cox模型估计风险比。我们还建立了机器学习模型，其中包括用Boruta算法选择的PWR和临床特征，以预测30天死亡率。模型的性能由受者工作特征曲线下面积（AUC）来评价。使用eICU数据库中的外部验证来评估通用性。结果：共纳入10 943例ICU ASCVD患者，男性占62%，中位年龄71岁。PWR最高的四分位数30天的全因死亡率低于最低的四分位数，分别为11%和15% (p)结论：PWR独立预测ASCVD ICU患者30天的全因死亡率。这些发现支持使用PWR进行风险分层，并为ASCVD的重症监护管理提供信息。

{"title":"Association between the platelet to white blood cell ratio and short term mortality in critically ill patients with atherosclerotic cardiovascular disease: A retrospective study and machine learning with external validation","authors":"Zhantao Cao , Hewei Qin , Yue Niu , Zilu Zhang , Guoju Dong","doi":"10.1016/j.ijmedinf.2026.106267","DOIUrl":"10.1016/j.ijmedinf.2026.106267","url":null,"abstract":"<div><h3>Background</h3><div>The platelet to white blood cell ratio (PWR) has shown prognostic value in many diseases. Yet its predictive utility for patients with atherosclerotic cardiovascular disease (ASCVD) who receive care in the intensive care unit (ICU) remains uncertain. We examined whether PWR at ICU admission is associated with short term all cause mortality among ICU patients with ASCVD.</div></div><div><h3>Methods</h3><div>We used the MIMIC IV and eICU databases to study the association between PWR and 30 day all cause mortality in critically ill patients with ASCVD. Patients were grouped by PWR quartiles. Collinearity was checked with the variance inflation factor (VIF). Nonlinearity was assessed with restricted cubic splines(RCS). Survival was compared with Kaplan Meier(KM) curves and the log rank test. Hazard ratios were estimated with stratified and adjusted Cox models. We also built machine learning models that included PWR and clinical features selected with the Boruta algorithm to predict 30 day mortality. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC). External validation in the eICU database was used to assess generalizability.</div></div><div><h3>Results</h3><div>A total of 10 943 ICU patients with ASCVD were included, 62 % were men, and the median age was 71 years. The highest PWR quartile had a lower 30 day all cause mortality than the lowest quartile, 11 % versus 15 % (<em>p</em> < 0.001). In multivariable Cox models the highest quartile had a lower risk (HR 0.80, 95 % CI 0.67 to 0.95, <em>p</em> = 0.012). RCS suggested a nonlinear association. Age modified the association, with a stronger protective effect in patients younger than 70 years (HR 0.64, 95 % CI 0.47 to 0.87, interaction <em>p</em> < 0.001). The best machine learning model achieved an AUC of 0.812 in internal validation and 0.80 in external validation. SHAP analysis showed that higher PWR was linked to a lower predicted risk of death.</div></div><div><h3>Conclusion</h3><div>PWR independently predicts 30 day all cause mortality in ICU patients with ASCVD. These findings support the use of PWR for risk stratification and to inform management in critical care for ASCVD.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106267"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging large language models to automate the identification of healthcare access barriers for veterans 利用大型语言模型自动识别退伍军人的医疗保健访问障碍。

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2025-12-30 DOI: 10.1016/j.ijmedinf.2025.106247

Sudarshan Srinivasan , Caitlin Rizy , Maria Mahbub , David Bolme , Alina Peluso , Jodie Trafton , Ioana Danciu

Objective

To develop and evaluate an automated system for identifying healthcare barriers focusing on transportation issues in veterans’ clinical notes using large language models (LLMs) and to assess the impact of different prompting strategies on classification performance and explanation consistency.

Methods

We developed a hybrid system combining pattern matching for templated notes with LLM analysis for free-text notes. Using 2000 manually annotated clinical notes, we compared four prompting strategies (dual-role short, dual-role long, analysis-first, analysis-only) across Mistral-7B and Llama-3.1 models. We evaluated classification performance using standard metrics and assessed explanation consistency through embedding similarity analysis.

Results

The analysis-first strategy achieved superior performance, with Mistral-7B reaching an F1 score of 0.914, outperforming traditional machine learning approaches (GBM: 0.786, BERT: 0.811). LLMs demonstrated higher explanation consistency within models (mean cosine similarity 0.887–0.908) compared to cross-model similarities (0.767–0.872). Pattern matching successfully handled 6.7% of templated notes deterministically. Mistral-7B showed greater internal consistency but higher abstention rates compared to Llama-3.1.

Conclusion

Requiring LLMs to analyze evidence before classification improves both accuracy and explanation consistency for identifying transportation barriers in clinical notes. This approach enables automated barrier detection at scale while providing clinically relevant explanations, supporting both population-level healthcare planning and individual patient care decisions.

目的：利用大语言模型（large language models, LLMs）开发和评估一套自动识别退伍军人临床记录中交通问题医疗障碍的系统，并评估不同提示策略对分类性能和解释一致性的影响。方法：开发了模板笔记模式匹配与自由文本笔记LLM分析相结合的混合系统。使用2000份人工注释的临床记录，我们比较了Mistral-7B和lama-3.1模型的四种提示策略（双角色短、双角色长、分析优先、仅分析）。我们使用标准指标评估分类性能，并通过嵌入相似度分析评估解释一致性。结果：分析优先策略取得了优异的性能，Mistral-7B达到了0.914的F1分数，优于传统的机器学习方法（GBM: 0.786, BERT: 0.811）。与跨模型相似性（0.767-0.872）相比，llm在模型内表现出更高的解释一致性（平均余弦相似性0.887-0.908）。模式匹配成功地确定地处理了6.7%的模板注释。与羊驼-3.1相比，Mistral-7B表现出更大的内部一致性，但更高的弃权率。结论：要求llm在分类前分析证据，提高了临床记录中运输障碍识别的准确性和解释的一致性。这种方法可以实现大规模的自动屏障检测，同时提供临床相关的解释，支持人群层面的医疗保健计划和个体患者护理决策。

{"title":"Leveraging large language models to automate the identification of healthcare access barriers for veterans","authors":"Sudarshan Srinivasan , Caitlin Rizy , Maria Mahbub , David Bolme , Alina Peluso , Jodie Trafton , Ioana Danciu","doi":"10.1016/j.ijmedinf.2025.106247","DOIUrl":"10.1016/j.ijmedinf.2025.106247","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and evaluate an automated system for identifying healthcare barriers focusing on transportation issues in veterans’ clinical notes using large language models (LLMs) and to assess the impact of different prompting strategies on classification performance and explanation consistency.</div></div><div><h3>Methods</h3><div>We developed a hybrid system combining pattern matching for templated notes with LLM analysis for free-text notes. Using 2000 manually annotated clinical notes, we compared four prompting strategies (dual-role short, dual-role long, analysis-first, analysis-only) across Mistral-7B and Llama-3.1 models. We evaluated classification performance using standard metrics and assessed explanation consistency through embedding similarity analysis.</div></div><div><h3>Results</h3><div>The analysis-first strategy achieved superior performance, with Mistral-7B reaching an F1 score of 0.914, outperforming traditional machine learning approaches (GBM: 0.786, BERT: 0.811). LLMs demonstrated higher explanation consistency within models (mean cosine similarity 0.887–0.908) compared to cross-model similarities (0.767–0.872). Pattern matching successfully handled 6.7% of templated notes deterministically. Mistral-7B showed greater internal consistency but higher abstention rates compared to Llama-3.1.</div></div><div><h3>Conclusion</h3><div>Requiring LLMs to analyze evidence before classification improves both accuracy and explanation consistency for identifying transportation barriers in clinical notes. This approach enables automated barrier detection at scale while providing clinically relevant explanations, supporting both population-level healthcare planning and individual patient care decisions.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106247"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A topic modeling analysis of stigma dimensions, social, and related behavioral circumstances in clinical notes among patients with HIV HIV患者临床记录中耻感维度、社会和相关行为环境的主题建模分析

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-06 DOI: 10.1016/j.ijmedinf.2026.106269

Ziyi Chen , Yiyang Liu , Mattia Prosperi , Krishna Vaddiparti , Robert L. Cook , Jiang Bian , Yi Guo , Yonghui Wu

Objective

To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States.

Methods

We identified a cohort of PLWHs from the University of Florida (UF) Health Integrated Data Repository and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma-related dimensions and related social and behavioral contexts. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. The detected topics were evaluated using three widely used metrics and manually reviewed by specialists. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age- and sex-specific demographics.

Results

We identified 9,140 PLWHs at UF Health and collected 2.9 million clinical notes. Through the iterative keyword approach, we generated a list of 91 keywords associated with HIV-related stigma. Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including “Mental Health Concern and Stigma”, “Social Support and Engagement”, “Limited Healthcare Access and Severe Illness”, “Missed Appointments and HIV Care Monitoring”, “Treatment Refusal and Isolation”, “Intimate Partner Violence and Relationship Concerns”, “Fear of Falling and Physical Health Concerns”, “Substance Abuse”, and “Food Insecurity and Resource Scarcity”. Topic variation analysis across sex and age subgroups revealed no substantial difference between males and females; however, there were differences were observed among different ages. For example, “Fear of Falling and Physical Health Concerns” was notably more prevalent among older adults.

Conclusion

Extracting and understanding the HIV-related stigma and associated social and behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment and overcoming the limitations of traditional questionnaires. Findings from this research provide actionable insights to inform patient care and interventions to improve HIV-care outcomes.

目的利用自然语言处理方法，对美国东南部一个大型综合卫生系统的大量电子健康记录（EHR）临床记录进行分析，表征HIV感染者（PLWHs）求医时的耻辱感维度、社会和相关行为环境。方法从佛罗里达大学（UF）健康综合数据库中选取一组PLWHs，并使用潜在狄利let分配（Latent Dirichlet Allocation， LDA）进行主题建模分析，以揭示耻感相关维度和相关的社会和行为背景。领域专家创建了一个与hiv相关的耻感关键词种子列表，然后应用滚雪球策略迭代地回顾笔记，直到达到饱和。为了识别更多的目标主题，我们测试了三种基于关键字的过滤策略。检测到的主题使用三个广泛使用的指标进行评估，并由专家手工审查。词频分析用于突出显示与每个主题相关的流行术语。此外，我们在亚组之间进行了主题变异分析，以检查不同年龄和性别的人口统计学差异。结果我们在佛罗里达大学健康中心确定了9140个plwh，并收集了290万份临床记录。通过迭代关键词方法，我们生成了91个与hiv相关的耻辱感相关的关键词列表。对包含至少一个关键词的句子进行主题建模，揭示了与艾滋病毒相关的耻辱、社会和相关行为情况相关的广泛主题，包括“精神健康问题和耻辱”、“社会支持和参与”、“有限的医疗保健机会和严重疾病”、“错过预约和艾滋病毒护理监测”、“拒绝治疗和隔离”、“亲密伴侣暴力和关系问题”、“害怕跌倒和身体健康问题”、“药物滥用”和“粮食不安全和资源短缺”。跨性别和年龄亚组的话题变异分析显示，男性和女性之间没有实质性差异；然而，在不同年龄之间存在差异。例如，“害怕跌倒和身体健康问题”在老年人中尤为普遍。结论从EHR临床记录中提取和理解hiv相关的污名以及相关的社会和行为环境，可以进行可扩展、省时的评估，并克服传统问卷调查的局限性。这项研究的结果为患者护理和干预提供了可操作的见解，以改善艾滋病毒护理结果。

{"title":"A topic modeling analysis of stigma dimensions, social, and related behavioral circumstances in clinical notes among patients with HIV","authors":"Ziyi Chen , Yiyang Liu , Mattia Prosperi , Krishna Vaddiparti , Robert L. Cook , Jiang Bian , Yi Guo , Yonghui Wu","doi":"10.1016/j.ijmedinf.2026.106269","DOIUrl":"10.1016/j.ijmedinf.2026.106269","url":null,"abstract":"<div><h3>Objective</h3><div>To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States.</div></div><div><h3>Methods</h3><div>We identified a cohort of PLWHs from the University of Florida (UF) Health Integrated Data Repository and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma-related dimensions and related social and behavioral contexts. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. The detected topics were evaluated using three widely used metrics and manually reviewed by specialists. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age- and sex-specific demographics.</div></div><div><h3>Results</h3><div>We identified 9,140 PLWHs at UF Health and collected 2.9 million clinical notes. Through the iterative keyword approach, we generated a list of 91 keywords associated with HIV-related stigma. Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including “Mental Health Concern and Stigma”, “Social Support and Engagement”, “Limited Healthcare Access and Severe Illness”, “Missed Appointments and HIV Care Monitoring”, “Treatment Refusal and Isolation”, “Intimate Partner Violence and Relationship Concerns”, “Fear of Falling and Physical Health Concerns”, “Substance Abuse”, and “Food Insecurity and Resource Scarcity”. Topic variation analysis across sex and age subgroups revealed no substantial difference between males and females; however, there were differences were observed among different ages. For example, “Fear of Falling and Physical Health Concerns” was notably more prevalent among older adults.</div></div><div><h3>Conclusion</h3><div>Extracting and understanding the HIV-related stigma and associated social and behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment and overcoming the limitations of traditional questionnaires. Findings from this research provide actionable insights to inform patient care and interventions to improve HIV-care outcomes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106269"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting emergency mortality risk in traumatic brain injury: comparative analysis of machine learning and large language model GPT-5 外伤性脑损伤紧急死亡风险预测：机器学习与大型语言模型GPT-5的比较分析。

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-05 DOI: 10.1016/j.ijmedinf.2026.106268

Kuan-Chi Tu , Yung-De Kuo , Tee-Tau Eric Nyam , Yu-Shan Ma , Mei-I Sung , Chung-Feng Liu , Ching-Lung Kuo

Background

Artificial intelligence (AI) has become increasingly important in predicting outcomes of traumatic brain injury (TBI). Traditional machine learning (ML) models, such as support vector machines (SVMs), have shown high accuracy, whereas the potential of large language models (LLMs) in structured clinical prediction remains underexplored.

Purpose

This study compared the predictive performance of ML and LLM approaches (ChatGPT-5 with Thinking response mode) using the same TBI dataset and evaluated the impact of prompting strategies and threshold calibration on model reliability and clinical applicability.

Methods

A dataset of 5,475 TBI cases with 12 clinical features was used to build an SVM model and four LLM strategies: zero-shot GPT, few-shot GPT, few-shot + chain-of-thought (CoT) GPT, and CoT-only GPT. Performance was evaluated by accuracy, sensitivity, specificity, and ROC-AUC under fixed and balanced thresholds.

Results

The SVM model achieved the best overall performance (AUC = 0.920). After threshold adjustment, all LLMs reached comparable AUCs (0.902–0.919). Few-shot GPT most closely matched SVM, CoT + few-shot achieved highest sensitivity, and CoT-only favored specificity.

Conclusion

Proper threshold calibration enables LLMs to approximate ML accuracy while offering rapid deployment and interpretability. Prompt engineering combined with adaptive cut-off tuning may enhance clinical usability of LLM-based prediction systems.

背景：人工智能（AI）在预测创伤性脑损伤（TBI）预后方面变得越来越重要。传统的机器学习（ML）模型，如支持向量机（svm），已经显示出很高的准确性，而大型语言模型（llm）在结构化临床预测中的潜力仍未得到充分开发。目的：本研究比较了使用相同TBI数据集的ML和LLM方法（ChatGPT-5与Thinking反应模式）的预测性能，并评估提示策略和阈值校准对模型可靠性和临床适用性的影响。方法：选取具有12个临床特征的5475例TBI病例数据集，构建支持向量机模型，并采用4种LLM策略：零次GPT、少次GPT、少次+思维链（CoT） GPT和单次思维链GPT。在固定和平衡阈值下，通过准确性、敏感性、特异性和ROC-AUC来评估性能。结果：SVM模型综合性能最佳（AUC = 0.920）。阈值调整后，所有llm均达到可比较的auc（0.902-0.919）。Few-shot GPT与SVM最接近，CoT + Few-shot灵敏度最高，而CoT仅具有特异性。结论：适当的阈值校准使llm能够近似ML精度，同时提供快速部署和可解释性。快速工程与自适应截止调节相结合可以提高基于llm的预测系统的临床可用性。

{"title":"Predicting emergency mortality risk in traumatic brain injury: comparative analysis of machine learning and large language model GPT-5","authors":"Kuan-Chi Tu , Yung-De Kuo , Tee-Tau Eric Nyam , Yu-Shan Ma , Mei-I Sung , Chung-Feng Liu , Ching-Lung Kuo","doi":"10.1016/j.ijmedinf.2026.106268","DOIUrl":"10.1016/j.ijmedinf.2026.106268","url":null,"abstract":"<div><h3>Background</h3><div>Artificial intelligence (AI) has become increasingly important in predicting outcomes of traumatic brain injury (TBI). Traditional machine learning (ML) models, such as support vector machines (SVMs), have shown high accuracy, whereas the potential of large language models (LLMs) in structured clinical prediction remains underexplored.</div></div><div><h3>Purpose</h3><div>This study compared the predictive performance of ML and LLM approaches (ChatGPT-5 with Thinking response mode) using the same TBI dataset and evaluated the impact of prompting strategies and threshold calibration on model reliability and clinical applicability.</div></div><div><h3>Methods</h3><div>A dataset of 5,475 TBI cases with 12 clinical features was used to build an SVM model and four LLM strategies: zero-shot GPT, few-shot GPT, few-shot + chain-of-thought (CoT) GPT, and CoT-only GPT. Performance was evaluated by accuracy, sensitivity, specificity, and ROC-AUC under fixed and balanced thresholds.</div></div><div><h3>Results</h3><div>The SVM model achieved the best overall performance (AUC = 0.920). After threshold adjustment, all LLMs reached comparable AUCs (0.902–0.919). Few-shot GPT most closely matched SVM, CoT + few-shot achieved highest sensitivity, and CoT-only favored specificity.</div></div><div><h3>Conclusion</h3><div>Proper threshold calibration enables LLMs to approximate ML accuracy while offering rapid deployment and interpretability. Prompt engineering combined with adaptive cut-off tuning may enhance clinical usability of LLM-based prediction systems.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106268"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145918986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond binary diagnosis: Key questions on AI accuracy, real-world applicability, and safety in clinical decision support 超越二元诊断：人工智能准确性、现实世界适用性和临床决策支持安全性的关键问题。

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-17 DOI: 10.1016/j.ijmedinf.2026.106292

Jin Ye

This comment relates to Kücking et al.’s (2026) study on the bidirectional effects of artificial intelligence recommendations and healthcare provider related factors on the accuracy of wound impregnation diagnosis. While acknowledging the valuable contributions of this research, including distinguishing between correct/incorrect artificial intelligence outputs, rigorous simulation design, and emphasis on clinical safety, we have raised key questions to enhance the interpretation of results and real-world translation. The main focuses include the moderating role of artificial intelligence system accuracy in automation bias, external effectiveness in real clinical environments, potential mechanisms for gender differences in diagnostic performance, the impact of visual cue design on decision-making, and the potential of explainable artificial intelligence (XAI) in risk mitigation. This review aims to promote further research and facilitate the safe and effective integration of artificial intelligence based clinical decision support systems (CDSS) into clinical practice.

这一评论涉及k cking等人（2026）关于人工智能推荐和医疗保健提供者相关因素对伤口浸渍诊断准确性的双向影响的研究。在承认这项研究的宝贵贡献的同时，包括区分正确/不正确的人工智能输出，严格的模拟设计，以及对临床安全性的强调，我们提出了一些关键问题，以加强对结果的解释和现实世界的翻译。主要重点包括人工智能系统准确性在自动化偏差中的调节作用，真实临床环境中的外部有效性，诊断表现性别差异的潜在机制，视觉线索设计对决策的影响，以及可解释人工智能（XAI）在风险缓解中的潜力。本文综述旨在促进进一步的研究，促进基于人工智能的临床决策支持系统（CDSS）安全有效地整合到临床实践中。

引用次数: 0

Development and validation of data-driven, decision tree–based algorithms for identifying Behçet’s disease in claims data 开发和验证数据驱动，决策树为基础的算法识别behet的疾病索赔数据。

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-05 DOI: 10.1016/j.ijmedinf.2026.106266

Ken-ei Sada , Yoshia Miyawaki , Ryo Yanai , Takashi Kida , Akira Onishi , Ryusuke Yoshimi , Kunihiro Ichinose , Yasuhiro Shimojima

Objective

To develop and externally validate novel, data-driven algorithms that are based on appropriate variable selection methods for identifying patients with Behçet’s disease in Japan.

Methods

This retrospective cross-sectional study included 13,538 patients from six tertiary hospitals (November–December 2023). One year of claims data was linked to chart-confirmed Behçet’s disease diagnoses. Patients were randomly divided into training (n = 8,811) and test (n = 3,775) sets, with external validation (n = 952) from another hospital. Feature selection among Behçet’s disease-coded patients used the Least Absolute Shrinkage and Selection Operator, Boruta, and Recursive Feature Elimination. The diagnostic performance of the rule-based algorithms, which were derived from the decision tree models, was evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value, and F1 score.

Results

Diagnosis codes alone achieved high sensitivity (1.000) and specificity (0.992) but modest PPV (0.767, test set; 0.850, external validation). Incorporating sulphamethoxazole–trimethoprim and colchicine prescriptions improved the positive predictive value, which was 0.793 in the test set and 0.865 in external validation.

Conclusion

Incorporating prescriptions alongside diagnosis codes improved PPV while maintaining high sensitivity and specificity. Building upon a data-driven framework that integrates variable selection methods and decision tree analysis, this study provides a validated and scalable approach for reliable claims-based research on Behçet’s disease.

目的：开发并外部验证基于适当变量选择方法的新型数据驱动算法，用于识别日本behet病患者。方法：回顾性横断面研究纳入6家三级医院（2023年11 - 12月）13538例患者。一年的索赔数据与图表确认的behaperet疾病诊断相关联。患者随机分为训练组（n = 8,811）和测试组（n = 3,775），其中外部验证组（n = 952）来自其他医院。使用最小绝对收缩和选择算子、Boruta和递归特征消除对behaperet病编码患者进行特征选择。基于规则的算法的诊断性能由决策树模型衍生而来，通过准确性、敏感性、特异性、阳性预测值（PPV）、阴性预测值和F1评分来评估。结果：单独诊断代码具有较高的灵敏度（1.000）和特异性（0.992），但PPV较低（0.767，试验集；0.850，外部验证）。联用磺胺甲氧嘧啶-甲氧苄啶和秋水仙碱提高了阳性预测值，试验集为0.793，外部验证为0.865。结论：结合处方和诊断代码可改善PPV，同时保持较高的敏感性和特异性。本研究建立在数据驱动的框架上，整合了变量选择方法和决策树分析，为可靠的基于索赔的behet病研究提供了一种经过验证和可扩展的方法。

{"title":"Development and validation of data-driven, decision tree–based algorithms for identifying Behçet’s disease in claims data","authors":"Ken-ei Sada , Yoshia Miyawaki , Ryo Yanai , Takashi Kida , Akira Onishi , Ryusuke Yoshimi , Kunihiro Ichinose , Yasuhiro Shimojima","doi":"10.1016/j.ijmedinf.2026.106266","DOIUrl":"10.1016/j.ijmedinf.2026.106266","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and externally validate novel, data-driven algorithms that are based on appropriate variable selection methods for identifying patients with Behçet’s disease in Japan.</div></div><div><h3>Methods</h3><div>This retrospective cross-sectional study included 13,538 patients from six tertiary hospitals (November–December 2023). One year of claims data was linked to chart-confirmed Behçet’s disease diagnoses. Patients were randomly divided into training (n = 8,811) and test (n = 3,775) sets, with external validation (n = 952) from another hospital. Feature selection among Behçet’s disease-coded patients used the Least Absolute Shrinkage and Selection Operator, Boruta, and Recursive Feature Elimination. The diagnostic performance of the rule-based algorithms, which were derived from the decision tree models, was evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value, and F1 score.</div></div><div><h3>Results</h3><div>Diagnosis codes alone achieved high sensitivity (1.000) and specificity (0.992) but modest PPV (0.767, test set; 0.850, external validation). Incorporating sulphamethoxazole–trimethoprim and colchicine prescriptions improved the positive predictive value, which was 0.793 in the test set and 0.865 in external validation.</div></div><div><h3>Conclusion</h3><div>Incorporating prescriptions alongside diagnosis codes improved PPV while maintaining high sensitivity and specificity. Building upon a data-driven framework that integrates variable selection methods and decision tree analysis, this study provides a validated and scalable approach for reliable claims-based research on Behçet’s disease.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106266"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incorporating patient history into the insulin sensitivity prediction in intensive care by feedforward neural network models 应用前馈神经网络模型将患者病史纳入重症监护患者胰岛素敏感性预测。

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-08 DOI: 10.1016/j.ijmedinf.2026.106273

Bálint Szabó , J.Geoffrey Chase , Balázs Benyó

Background and Objective

Insulin sensitivity prediction is crucial for model-based treatment in Intensive Care Unit patients, particularly those with hyperglycemia. However, predicting insulin sensitivity is challenging due to inter- and intra-patient variability.

Methods

Different neural network models are proposed and compared for predicting insulin sensitivity, including recurrent and feedforward versions of the Classification Deep Neural Network and Mixture Density Network models. These models were trained using 1879 patient records containing 123,988 insulin sensitivity values from three intensive care patient cohorts in three different countries.

Results

Results show that using patient history in prediction models can improve the accuracy of insulin sensitivity predictions. The Mixture Density Network model provided more accurate predictions, measured by a problem-specific metric that expresses clinical requirements. We demonstrated that even using up to 12 h of historical data can improve prediction accuracy.

Conclusion

This study highlights the potential of recurrent neural network models in predicting insulin sensitivity in Intensive Care Unit patients. Our findings suggest that using recurrent neural networks and incorporating patient history can lead to more accurate predictions. These results are generalizable due to the large and diverse dataset employed, which included patients from three different cohorts in three care settings.

背景和目的：胰岛素敏感性预测对于重症监护病房患者，特别是高血糖患者的模型治疗至关重要。然而，由于患者之间和患者内部的可变性，预测胰岛素敏感性是具有挑战性的。方法：提出并比较了用于预测胰岛素敏感性的不同神经网络模型，包括循环和前馈版本的分类深度神经网络和混合密度网络模型。这些模型使用1879例患者记录进行训练，其中包含来自三个不同国家的三个重症监护患者队列的123,988个胰岛素敏感性值。结果：结果表明，在预测模型中使用患者病史可以提高胰岛素敏感性预测的准确性。混合密度网络模型提供了更准确的预测，通过表达临床需求的特定问题度量来测量。我们证明即使使用长达12小时的历史数据也可以提高预测精度。结论：本研究强调了循环神经网络模型在预测重症监护病房患者胰岛素敏感性方面的潜力。我们的研究结果表明，使用循环神经网络并结合患者病史可以做出更准确的预测。由于采用了庞大而多样化的数据集，其中包括来自三个不同护理环境的三个不同队列的患者，因此这些结果具有普遍性。

{"title":"Incorporating patient history into the insulin sensitivity prediction in intensive care by feedforward neural network models","authors":"Bálint Szabó , J.Geoffrey Chase , Balázs Benyó","doi":"10.1016/j.ijmedinf.2026.106273","DOIUrl":"10.1016/j.ijmedinf.2026.106273","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Insulin sensitivity prediction is crucial for model-based treatment in Intensive Care Unit patients, particularly those with hyperglycemia. However, predicting insulin sensitivity is challenging due to inter- and intra-patient variability.</div></div><div><h3>Methods</h3><div>Different neural network models are proposed and compared for predicting insulin sensitivity, including recurrent and feedforward versions of the Classification Deep Neural Network and Mixture Density Network models. These models were trained using 1879 patient records containing 123,988 insulin sensitivity values from three intensive care patient cohorts in three different countries.</div></div><div><h3>Results</h3><div>Results show that using patient history in prediction models can improve the accuracy of insulin sensitivity predictions. The Mixture Density Network model provided more accurate predictions, measured by a problem-specific metric that expresses clinical requirements. We demonstrated that even using up to 12 h of historical data can improve prediction accuracy.</div></div><div><h3>Conclusion</h3><div>This study highlights the potential of recurrent neural network models in predicting insulin sensitivity in Intensive Care Unit patients. Our findings suggest that using recurrent neural networks and incorporating patient history can lead to more accurate predictions. These results are generalizable due to the large and diverse dataset employed, which included patients from three different cohorts in three care settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106273"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Communicable diseases platform (CDP): Real-Time clinical analytics for infections 传染病平台（CDP）：感染的实时临床分析

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-12 DOI: 10.1016/j.ijmedinf.2026.106277

Manuri De Silva , Alice Voskoboynik , Sailavan Ramesh , Janice Campbell , Saravanan Satkumaran , Daryl R. Cheng

Objective

Communicable diseases, especially seasonal respiratory illnesses, contribute significantly to paediatric hospital presentations and admissions. Existing surveillance systems often require retrospective manual data collation and focus on either demographic or clinical data, not both. The Communicable Diseases Platform (CDP) is a dynamic data platform that aggregates both data types for all communicable disease presentations to The Royal Children’s Hospital Melbourne (RCH).

Methods

In the pilot phase, the CDP extracted de-identified aggregated data from hospital electronic medical records for patients with positive respiratory swabs. A dashboard displayed positivity rate and cumulative hospital admissions trends from 2016 to 2025, further filterable by pathogen, age, presentation type and interventions.

Discussion

The CDP improves understanding of clinical profiles, disease burden and seasonal patterns, supporting better outbreak control, patient flow prediction and clinical surveillance. Future developments include immunisation data integration and machine learning algorithm evaluation for real-time vaccine effectiveness estimations and communicable disease predictive modelling.

目的：传染性疾病，特别是季节性呼吸道疾病，是儿科就诊和住院的主要原因。现有的监测系统通常需要回顾性的人工数据整理，并侧重于人口统计或临床数据，而不是两者兼而有之。传染病平台（CDP）是一个动态数据平台，汇集了墨尔本皇家儿童医院（RCH）所有传染病报告的两种数据类型。方法在试点阶段，CDP从医院电子病历中提取呼吸道拭子阳性患者的去识别汇总数据。仪表板显示了2016年至2025年的阳性率和累计住院趋势，并进一步按病原体、年龄、表现类型和干预措施进行过滤。CDP提高了对临床概况、疾病负担和季节性模式的理解，支持更好的疫情控制、患者流量预测和临床监测。未来的发展包括免疫数据集成和机器学习算法评估，用于实时疫苗有效性估计和传染病预测建模。

{"title":"Communicable diseases platform (CDP): Real-Time clinical analytics for infections","authors":"Manuri De Silva , Alice Voskoboynik , Sailavan Ramesh , Janice Campbell , Saravanan Satkumaran , Daryl R. Cheng","doi":"10.1016/j.ijmedinf.2026.106277","DOIUrl":"10.1016/j.ijmedinf.2026.106277","url":null,"abstract":"<div><h3>Objective</h3><div>Communicable diseases, especially seasonal respiratory illnesses, contribute significantly to paediatric hospital presentations and admissions. Existing surveillance systems often require retrospective manual data collation and focus on either demographic or clinical data, not both. The Communicable Diseases Platform (CDP) is a dynamic data platform that aggregates both data types for all communicable disease presentations to The Royal Children’s Hospital Melbourne (RCH).</div></div><div><h3>Methods</h3><div>In the pilot phase, the CDP extracted de-identified aggregated data from hospital electronic medical records for patients with positive respiratory swabs. A dashboard displayed positivity rate and cumulative hospital admissions trends from 2016 to 2025, further filterable by pathogen, age, presentation type and interventions.</div></div><div><h3>Discussion</h3><div>The CDP improves understanding of clinical profiles, disease burden and seasonal patterns, supporting better outbreak control, patient flow prediction and clinical surveillance. Future developments include immunisation data integration and machine learning algorithm evaluation for real-time vaccine effectiveness estimations and communicable disease predictive modelling.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106277"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

“Calibration or contamination?” Reassessing the evaluation of large language models for clinical mortality prediction “校准还是污染？”重新评估大型语言模型对临床死亡率预测的评价

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-14 DOI: 10.1016/j.ijmedinf.2026.106291

Zhihao Lei

引用次数: 0

An old disease, a new linguistic challenge for large language models: patient education on psoriasis and psoriatic arthritis in an underrepresented medical language 一种古老的疾病，对大型语言模型的新的语言挑战：在代表性不足的医学语言中对银屑病和银屑病关节炎的患者教育

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics

Pub Date : 2026-04-01 Epub Date: 2026-01-10 DOI: 10.1016/j.ijmedinf.2025.106246

Ahmet Ugur Atilan, Niyazi Cetin

Objective

Large Language Models (LLMs) are increasingly applied to patient education, yet their performance in languages that are relatively underrepresented in medical-domain corpora and large language model training datasets remains underexplored. Psoriasis and psoriatic arthritis (PsA) are chronic, immune-mediated diseases requiring lifelong patient engagement, making them suitable conditions to evaluate the clarity, reliability, and inclusivity of AI-generated educational content. To assess the comprehensibility, scientific reliability, and patient-centered communication of Turkish patient education materials for psoriasis vulgaris and PsA generated by seven state-of-the-art LLMs.

Methods

A cross-sectional analysis compared outputs from ChatGPT-4o, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Qwen 2.5, DeepSeek R1, and Mistral Large 2. Brochures were produced using standardized zero-shot prompts and evaluated via the Ateşman readability index and the DISCERN instrument. Overall differences in DISCERN scores across the seven models were assessed using a Friedman test, followed by Bonferroni-adjusted Wilcoxon signed-rank post-hoc analyses.

Results

Readability scores ranged from 61.6 to 80.2 (mean = 71.3 ± 6.9), with ChatGPT-4o and Qwen 2.5 generating the most accessible texts. DISCERN reliability scores ranged from 38.5 to 60.5, with Claude 3.7 Sonnet and Gemini 2.0 Flash showing the highest accuracy. Models prioritizing factual precision produced denser language, while conversational models favored fluency but sacrificed depth. Notable variation was observed, with only Claude 3.7 Sonnet and Gemini 2.0 Flash consistently reflecting patient-centered perspectives.

Conclusion

LLMs showed observable differences in balancing clarity and reliability when generating health education leaflets in Turkish. Most outputs appeared to lack explicit psychosocial framing and emphasis on shared decision-making, which may suggest the need for more culturally adaptive training, clinician oversight, and locally grounded validation frameworks to support safe and inclusive AI-based patient education.

大型语言模型（llm）越来越多地应用于患者教育，但它们在医学领域语料库和大型语言模型训练数据集中相对代表性不足的语言中的表现仍未得到充分探索。银屑病和银屑病关节炎（PsA）是一种慢性、免疫介导的疾病，需要患者终身参与，因此它们是评估人工智能生成的教育内容的清晰度、可靠性和包容性的合适条件。评估由7位最先进的法学硕士生成的土耳其寻常型牛皮癣和PsA患者教育材料的可理解性、科学可靠性和以患者为中心的交流。方法横断面分析比较chatgpt - 40、Gemini 2.0 Flash、Claude 3.7 Sonnet、Grok 3、Qwen 2.5、DeepSeek R1和Mistral Large 2的输出。使用标准化的零射击提示制作小册子，并通过ate可读性指数和DISCERN仪器进行评估。七个模型中辨别得分的总体差异采用弗里德曼检验进行评估，随后采用bonferroni调整的Wilcoxon符号秩事后分析。结果可读性评分范围为61.6 ~ 80.2（平均= 71.3±6.9），其中chatgpt - 40和qwen2.5生成的文本可读性最高。DISCERN的可靠性得分从38.5到60.5不等，克劳德3.7十四行诗和双子座2.0闪光显示出最高的准确性。优先考虑事实准确性的模型产生了更密集的语言，而会话模型倾向于流利，但牺牲了深度。观察到显著的差异，只有克劳德3.7十四行诗和双子座2.0闪光一致地反映了以患者为中心的观点。结论llm在制作土耳其语健康教育宣传单的平衡清晰度和可靠性方面存在显著差异。大多数产出似乎缺乏明确的社会心理框架和对共同决策的强调，这可能表明需要更多的文化适应性培训、临床医生监督和基于当地的验证框架，以支持安全和包容的基于人工智能的患者教育。

{"title":"An old disease, a new linguistic challenge for large language models: patient education on psoriasis and psoriatic arthritis in an underrepresented medical language","authors":"Ahmet Ugur Atilan, Niyazi Cetin","doi":"10.1016/j.ijmedinf.2025.106246","DOIUrl":"10.1016/j.ijmedinf.2025.106246","url":null,"abstract":"<div><h3>Objective</h3><div>Large Language Models (LLMs) are increasingly applied to patient education, yet their performance in languages that are relatively underrepresented in medical-domain corpora and large language model training datasets remains underexplored. Psoriasis and psoriatic arthritis (PsA) are chronic, immune-mediated diseases requiring lifelong patient engagement, making them suitable conditions to evaluate the clarity, reliability, and inclusivity of AI-generated educational content. To assess the comprehensibility, scientific reliability, and patient-centered communication of Turkish patient education materials for psoriasis vulgaris and PsA generated by seven state-of-the-art LLMs.</div></div><div><h3>Methods</h3><div>A cross-sectional analysis compared outputs from ChatGPT-4o, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Qwen 2.5, DeepSeek R1, and Mistral Large 2. Brochures were produced using standardized zero-shot prompts and evaluated via the Ateşman readability index and the DISCERN instrument. Overall differences in DISCERN scores across the seven models were assessed using a Friedman test, followed by Bonferroni-adjusted Wilcoxon signed-rank post-hoc analyses.</div></div><div><h3>Results</h3><div>Readability scores ranged from 61.6 to 80.2 (mean = 71.3 ± 6.9), with ChatGPT-4o and Qwen 2.5 generating the most accessible texts. DISCERN reliability scores ranged from 38.5 to 60.5, with Claude 3.7 Sonnet and Gemini 2.0 Flash showing the highest accuracy. Models prioritizing factual precision produced denser language, while conversational models favored fluency but sacrificed depth. Notable variation was observed, with only Claude 3.7 Sonnet and Gemini 2.0 Flash consistently reflecting patient-centered perspectives.</div></div><div><h3>Conclusion</h3><div>LLMs showed observable differences in balancing clarity and reliability when generating health education leaflets in Turkish. Most outputs appeared to lack explicit psychosocial framing and emphasis on shared decision-making, which may suggest the need for more culturally adaptive training, clinician oversight, and locally grounded validation frameworks to support safe and inclusive AI-based patient education.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106246"},"PeriodicalIF":4.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0