首页 > 最新文献

International Journal of Medical Informatics最新文献

英文 中文
A topic modeling analysis of stigma dimensions, social, and related behavioral circumstances in clinical notes among patients with HIV HIV患者临床记录中耻感维度、社会和相关行为环境的主题建模分析
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 DOI: 10.1016/j.ijmedinf.2026.106269
Ziyi Chen , Yiyang Liu , Mattia Prosperi , Krishna Vaddiparti , Robert L. Cook , Jiang Bian , Yi Guo , Yonghui Wu

Objective

To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States.

Methods

We identified a cohort of PLWHs from the University of Florida (UF) Health Integrated Data Repository and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma-related dimensions and related social and behavioral contexts. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. The detected topics were evaluated using three widely used metrics and manually reviewed by specialists. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age- and sex-specific demographics.

Results

We identified 9,140 PLWHs at UF Health and collected 2.9 million clinical notes. Through the iterative keyword approach, we generated a list of 91 keywords associated with HIV-related stigma. Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including “Mental Health Concern and Stigma”, “Social Support and Engagement”, “Limited Healthcare Access and Severe Illness”, “Missed Appointments and HIV Care Monitoring”, “Treatment Refusal and Isolation”, “Intimate Partner Violence and Relationship Concerns”, “Fear of Falling and Physical Health Concerns”, “Substance Abuse”, and “Food Insecurity and Resource Scarcity”. Topic variation analysis across sex and age subgroups revealed no substantial difference between males and females; however, there were differences were observed among different ages. For example, “Fear of Falling and Physical Health Concerns” was notably more prevalent among older adults.

Conclusion

Extracting and understanding the HIV-related stigma and associated social and behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment and overcoming the limitations of traditional questionnaires. Findings from this research provide actionable insights to inform patient care and interventions to improve HIV-care outcomes.
目的利用自然语言处理方法,对美国东南部一个大型综合卫生系统的大量电子健康记录(EHR)临床记录进行分析,表征HIV感染者(PLWHs)求医时的耻辱感维度、社会和相关行为环境。方法从佛罗里达大学(UF)健康综合数据库中选取一组PLWHs,并使用潜在狄利let分配(Latent Dirichlet Allocation, LDA)进行主题建模分析,以揭示耻感相关维度和相关的社会和行为背景。领域专家创建了一个与hiv相关的耻感关键词种子列表,然后应用滚雪球策略迭代地回顾笔记,直到达到饱和。为了识别更多的目标主题,我们测试了三种基于关键字的过滤策略。检测到的主题使用三个广泛使用的指标进行评估,并由专家手工审查。词频分析用于突出显示与每个主题相关的流行术语。此外,我们在亚组之间进行了主题变异分析,以检查不同年龄和性别的人口统计学差异。结果我们在佛罗里达大学健康中心确定了9140个plwh,并收集了290万份临床记录。通过迭代关键词方法,我们生成了91个与hiv相关的耻辱感相关的关键词列表。对包含至少一个关键词的句子进行主题建模,揭示了与艾滋病毒相关的耻辱、社会和相关行为情况相关的广泛主题,包括“精神健康问题和耻辱”、“社会支持和参与”、“有限的医疗保健机会和严重疾病”、“错过预约和艾滋病毒护理监测”、“拒绝治疗和隔离”、“亲密伴侣暴力和关系问题”、“害怕跌倒和身体健康问题”、“药物滥用”和“粮食不安全和资源短缺”。跨性别和年龄亚组的话题变异分析显示,男性和女性之间没有实质性差异;然而,在不同年龄之间存在差异。例如,“害怕跌倒和身体健康问题”在老年人中尤为普遍。结论从EHR临床记录中提取和理解hiv相关的污名以及相关的社会和行为环境,可以进行可扩展、省时的评估,并克服传统问卷调查的局限性。这项研究的结果为患者护理和干预提供了可操作的见解,以改善艾滋病毒护理结果。
{"title":"A topic modeling analysis of stigma dimensions, social, and related behavioral circumstances in clinical notes among patients with HIV","authors":"Ziyi Chen ,&nbsp;Yiyang Liu ,&nbsp;Mattia Prosperi ,&nbsp;Krishna Vaddiparti ,&nbsp;Robert L. Cook ,&nbsp;Jiang Bian ,&nbsp;Yi Guo ,&nbsp;Yonghui Wu","doi":"10.1016/j.ijmedinf.2026.106269","DOIUrl":"10.1016/j.ijmedinf.2026.106269","url":null,"abstract":"<div><h3>Objective</h3><div>To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States.</div></div><div><h3>Methods</h3><div>We identified a cohort of PLWHs from the University of Florida (UF) Health Integrated Data Repository and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma-related dimensions and related social and behavioral contexts. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. The detected topics were evaluated using three widely used metrics and manually reviewed by specialists. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age- and sex-specific demographics.</div></div><div><h3>Results</h3><div>We identified 9,140 PLWHs at UF Health and collected 2.9 million clinical notes. Through the iterative keyword approach, we generated a list of 91 keywords associated with HIV-related stigma. Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including “Mental Health Concern and Stigma”, “Social Support and Engagement”, “Limited Healthcare Access and Severe Illness”, “Missed Appointments and HIV Care Monitoring”, “Treatment Refusal and Isolation”, “Intimate Partner Violence and Relationship Concerns”, “Fear of Falling and Physical Health Concerns”, “Substance Abuse”, and “Food Insecurity and Resource Scarcity”. Topic variation analysis across sex and age subgroups revealed no substantial difference between males and females; however, there were differences were observed among different ages. For example, “Fear of Falling and Physical Health Concerns” was notably more prevalent among older adults.</div></div><div><h3>Conclusion</h3><div>Extracting and understanding the HIV-related stigma and associated social and behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment and overcoming the limitations of traditional questionnaires. Findings from this research provide actionable insights to inform patient care and interventions to improve HIV-care outcomes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106269"},"PeriodicalIF":4.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting emergency mortality risk in traumatic brain injury: comparative analysis of machine learning and large language model GPT-5 外伤性脑损伤紧急死亡风险预测:机器学习与大型语言模型GPT-5的比较分析。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-05 DOI: 10.1016/j.ijmedinf.2026.106268
Kuan-Chi Tu , Yung-De Kuo , Tee-Tau Eric Nyam , Yu-Shan Ma , Mei-I Sung , Chung-Feng Liu , Ching-Lung Kuo

Background

Artificial intelligence (AI) has become increasingly important in predicting outcomes of traumatic brain injury (TBI). Traditional machine learning (ML) models, such as support vector machines (SVMs), have shown high accuracy, whereas the potential of large language models (LLMs) in structured clinical prediction remains underexplored.

Purpose

This study compared the predictive performance of ML and LLM approaches (ChatGPT-5 with Thinking response mode) using the same TBI dataset and evaluated the impact of prompting strategies and threshold calibration on model reliability and clinical applicability.

Methods

A dataset of 5,475 TBI cases with 12 clinical features was used to build an SVM model and four LLM strategies: zero-shot GPT, few-shot GPT, few-shot + chain-of-thought (CoT) GPT, and CoT-only GPT. Performance was evaluated by accuracy, sensitivity, specificity, and ROC-AUC under fixed and balanced thresholds.

Results

The SVM model achieved the best overall performance (AUC = 0.920). After threshold adjustment, all LLMs reached comparable AUCs (0.902–0.919). Few-shot GPT most closely matched SVM, CoT + few-shot achieved highest sensitivity, and CoT-only favored specificity.

Conclusion

Proper threshold calibration enables LLMs to approximate ML accuracy while offering rapid deployment and interpretability. Prompt engineering combined with adaptive cut-off tuning may enhance clinical usability of LLM-based prediction systems.
背景:人工智能(AI)在预测创伤性脑损伤(TBI)预后方面变得越来越重要。传统的机器学习(ML)模型,如支持向量机(svm),已经显示出很高的准确性,而大型语言模型(llm)在结构化临床预测中的潜力仍未得到充分开发。目的:本研究比较了使用相同TBI数据集的ML和LLM方法(ChatGPT-5与Thinking反应模式)的预测性能,并评估提示策略和阈值校准对模型可靠性和临床适用性的影响。方法:选取具有12个临床特征的5475例TBI病例数据集,构建支持向量机模型,并采用4种LLM策略:零次GPT、少次GPT、少次+思维链(CoT) GPT和单次思维链GPT。在固定和平衡阈值下,通过准确性、敏感性、特异性和ROC-AUC来评估性能。结果:SVM模型综合性能最佳(AUC = 0.920)。阈值调整后,所有llm均达到可比较的auc(0.902-0.919)。Few-shot GPT与SVM最接近,CoT + Few-shot灵敏度最高,而CoT仅具有特异性。结论:适当的阈值校准使llm能够近似ML精度,同时提供快速部署和可解释性。快速工程与自适应截止调节相结合可以提高基于llm的预测系统的临床可用性。
{"title":"Predicting emergency mortality risk in traumatic brain injury: comparative analysis of machine learning and large language model GPT-5","authors":"Kuan-Chi Tu ,&nbsp;Yung-De Kuo ,&nbsp;Tee-Tau Eric Nyam ,&nbsp;Yu-Shan Ma ,&nbsp;Mei-I Sung ,&nbsp;Chung-Feng Liu ,&nbsp;Ching-Lung Kuo","doi":"10.1016/j.ijmedinf.2026.106268","DOIUrl":"10.1016/j.ijmedinf.2026.106268","url":null,"abstract":"<div><h3>Background</h3><div>Artificial intelligence (AI) has become increasingly important in predicting outcomes of traumatic brain injury (TBI). Traditional machine learning (ML) models, such as support vector machines (SVMs), have shown high accuracy, whereas the potential of large language models (LLMs) in structured clinical prediction remains underexplored.</div></div><div><h3>Purpose</h3><div>This study compared the predictive performance of ML and LLM approaches (ChatGPT-5 with Thinking response mode) using the same TBI dataset and evaluated the impact of prompting strategies and threshold calibration on model reliability and clinical applicability.</div></div><div><h3>Methods</h3><div>A dataset of 5,475 TBI cases with 12 clinical features was used to build an SVM model and four LLM strategies: zero-shot GPT, few-shot GPT, few-shot + chain-of-thought (CoT) GPT, and CoT-only GPT. Performance was evaluated by accuracy, sensitivity, specificity, and ROC-AUC under fixed and balanced thresholds.</div></div><div><h3>Results</h3><div>The SVM model achieved the best overall performance (AUC = 0.920). After threshold adjustment, all LLMs reached comparable AUCs (0.902–0.919). Few-shot GPT most closely matched SVM, CoT + few-shot achieved highest sensitivity, and CoT-only favored specificity.</div></div><div><h3>Conclusion</h3><div>Proper threshold calibration enables LLMs to approximate ML accuracy while offering rapid deployment and interpretability. Prompt engineering combined with adaptive cut-off tuning may enhance clinical usability of LLM-based prediction systems.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106268"},"PeriodicalIF":4.1,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145918986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of data-driven, decision tree–based algorithms for identifying Behçet’s disease in claims data 开发和验证数据驱动,决策树为基础的算法识别behet的疾病索赔数据。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-05 DOI: 10.1016/j.ijmedinf.2026.106266
Ken-ei Sada , Yoshia Miyawaki , Ryo Yanai , Takashi Kida , Akira Onishi , Ryusuke Yoshimi , Kunihiro Ichinose , Yasuhiro Shimojima

Objective

To develop and externally validate novel, data-driven algorithms that are based on appropriate variable selection methods for identifying patients with Behçet’s disease in Japan.

Methods

This retrospective cross-sectional study included 13,538 patients from six tertiary hospitals (November–December 2023). One year of claims data was linked to chart-confirmed Behçet’s disease diagnoses. Patients were randomly divided into training (n = 8,811) and test (n = 3,775) sets, with external validation (n = 952) from another hospital. Feature selection among Behçet’s disease-coded patients used the Least Absolute Shrinkage and Selection Operator, Boruta, and Recursive Feature Elimination. The diagnostic performance of the rule-based algorithms, which were derived from the decision tree models, was evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value, and F1 score.

Results

Diagnosis codes alone achieved high sensitivity (1.000) and specificity (0.992) but modest PPV (0.767, test set; 0.850, external validation). Incorporating sulphamethoxazole–trimethoprim and colchicine prescriptions improved the positive predictive value, which was 0.793 in the test set and 0.865 in external validation.

Conclusion

Incorporating prescriptions alongside diagnosis codes improved PPV while maintaining high sensitivity and specificity. Building upon a data-driven framework that integrates variable selection methods and decision tree analysis, this study provides a validated and scalable approach for reliable claims-based research on Behçet’s disease.
目的:开发并外部验证基于适当变量选择方法的新型数据驱动算法,用于识别日本behet病患者。方法:回顾性横断面研究纳入6家三级医院(2023年11 - 12月)13538例患者。一年的索赔数据与图表确认的behaperet疾病诊断相关联。患者随机分为训练组(n = 8,811)和测试组(n = 3,775),其中外部验证组(n = 952)来自其他医院。使用最小绝对收缩和选择算子、Boruta和递归特征消除对behaperet病编码患者进行特征选择。基于规则的算法的诊断性能由决策树模型衍生而来,通过准确性、敏感性、特异性、阳性预测值(PPV)、阴性预测值和F1评分来评估。结果:单独诊断代码具有较高的灵敏度(1.000)和特异性(0.992),但PPV较低(0.767,试验集;0.850,外部验证)。联用磺胺甲氧嘧啶-甲氧苄啶和秋水仙碱提高了阳性预测值,试验集为0.793,外部验证为0.865。结论:结合处方和诊断代码可改善PPV,同时保持较高的敏感性和特异性。本研究建立在数据驱动的框架上,整合了变量选择方法和决策树分析,为可靠的基于索赔的behet病研究提供了一种经过验证和可扩展的方法。
{"title":"Development and validation of data-driven, decision tree–based algorithms for identifying Behçet’s disease in claims data","authors":"Ken-ei Sada ,&nbsp;Yoshia Miyawaki ,&nbsp;Ryo Yanai ,&nbsp;Takashi Kida ,&nbsp;Akira Onishi ,&nbsp;Ryusuke Yoshimi ,&nbsp;Kunihiro Ichinose ,&nbsp;Yasuhiro Shimojima","doi":"10.1016/j.ijmedinf.2026.106266","DOIUrl":"10.1016/j.ijmedinf.2026.106266","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and externally validate novel, data-driven algorithms that are based on appropriate variable selection methods for identifying patients with Behçet’s disease in Japan.</div></div><div><h3>Methods</h3><div>This retrospective cross-sectional study included 13,538 patients from six tertiary hospitals (November–December 2023). One year of claims data was linked to chart-confirmed Behçet’s disease diagnoses. Patients were randomly divided into training (n = 8,811) and test (n = 3,775) sets, with external validation (n = 952) from another hospital. Feature selection among Behçet’s disease-coded patients used the Least Absolute Shrinkage and Selection Operator, Boruta, and Recursive Feature Elimination. The diagnostic performance of the rule-based algorithms, which were derived from the decision tree models, was evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value, and F1 score.</div></div><div><h3>Results</h3><div>Diagnosis codes alone achieved high sensitivity (1.000) and specificity (0.992) but modest PPV (0.767, test set; 0.850, external validation). Incorporating sulphamethoxazole–trimethoprim and colchicine prescriptions improved the positive predictive value, which was 0.793 in the test set and 0.865 in external validation.</div></div><div><h3>Conclusion</h3><div>Incorporating prescriptions alongside diagnosis codes improved PPV while maintaining high sensitivity and specificity. Building upon a data-driven framework that integrates variable selection methods and decision tree analysis, this study provides a validated and scalable approach for reliable claims-based research on Behçet’s disease.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106266"},"PeriodicalIF":4.1,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of electronic health record to HL7® FHIR® mappings in pediatric research studies 儿科研究中电子健康记录与HL7®FHIR®映射的评价
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-04 DOI: 10.1016/j.ijmedinf.2026.106265
Maryam Y. Garza , Zhan Wang , Bhargav Adagarla , Michael W. Rutherford , Umit Topaloglu , Daniel K. Benjamin , Kanecia O. Zimmerman , Eric L. Eisenstein , Karan R. Kumar , on behalf of the Best Pharmaceuticals for Children Act – Pediatric Trials Network Steering Committee

Background

eSource technologies that exchange patient data from electronic health records (EHR) to clinical study electronic data capture (EDC) systems can reduce data quality errors and decrease data collection time. However, the availability of site-specific EHR data to support pediatric studies has not been evaluated.

Methods

We used a previously developed data element mapping procedure to evaluate the HL7® FHIR® standard’s coverage in multi-center pediatric clinical studies. Four study sites independently mapped three pediatric studies’ case report forms (CRFs) to their site’s EHR and FHIR® server data elements.

Results

Site investigators mapped 4152 total and 2070 distinct data elements. Only 33.8 % of total CRF data elements (n = 1402) and 27.4 % of distinct data elements (n = 568) were able to be mapped in FHIR® at the four sites. However, the percent of total data elements mapped varied by pediatric study (55.3 %, 30.8 %, and 26.2 %) and study site (46.4 %, 32.3 %, 27.8 %, and 26.6 %). The percent of total CRF data elements mapped was higher in domains containing standard of care data (e.g., Concomitant Medications, Demographics, Diagnosis/Procedures, Medical History, and Vital Signs) and lower in domains containing protocol-specific data (e.g., Adverse Events, Concomitant Medications, Enrollment/Eligibility/Consent, and study treatment-related Labs, and Vital Signs).

Conclusions

There is substantial between-study and between-site variability in the percentage of pediatric study data elements available in FHIR® at study sites. These results suggest that mapping solutions for pediatric studies utilizing eSource technologies will need to be site-specific.
背景:将患者数据从电子健康记录(EHR)交换到临床研究电子数据捕获(EDC)系统的资源技术可以减少数据质量错误并缩短数据收集时间。然而,支持儿科研究的特定地点电子病历数据的可用性尚未得到评估。方法:我们使用先前开发的数据元素映射程序来评估HL7®FHIR®标准在多中心儿科临床研究中的覆盖范围。四个研究站点独立地将三个儿科研究病例报告表格(CRFs)映射到其站点的EHR和FHIR®服务器数据元素。结果:现场调查人员共绘制了4152个数据元素和2070个不同的数据元素。只有33.8%的总CRF数据元素(n = 1402)和27.4%的不同数据元素(n = 568)能够在四个位点的FHIR®中被映射。然而,总数据元素映射的百分比因儿科研究(55.3%、30.8%和26.2%)和研究地点(46.4%、32.3%、27.8%和26.6%)而异。总CRF数据元素的百分比在包含护理标准数据(例如,伴随用药、人口统计学、诊断/程序、病史和生命体征)的领域较高,而在包含特定方案数据(例如,不良事件、伴随用药、入组/资格/同意、研究治疗相关实验室和生命体征)的领域较低。结论:在FHIR®的研究地点中,儿科研究数据元素的百分比在研究之间和研究地点之间存在很大的差异。这些结果表明,利用eSource技术进行儿科研究的制图解决方案需要针对具体地点。
{"title":"Evaluation of electronic health record to HL7® FHIR® mappings in pediatric research studies","authors":"Maryam Y. Garza ,&nbsp;Zhan Wang ,&nbsp;Bhargav Adagarla ,&nbsp;Michael W. Rutherford ,&nbsp;Umit Topaloglu ,&nbsp;Daniel K. Benjamin ,&nbsp;Kanecia O. Zimmerman ,&nbsp;Eric L. Eisenstein ,&nbsp;Karan R. Kumar ,&nbsp;on behalf of the Best Pharmaceuticals for Children Act – Pediatric Trials Network Steering Committee","doi":"10.1016/j.ijmedinf.2026.106265","DOIUrl":"10.1016/j.ijmedinf.2026.106265","url":null,"abstract":"<div><h3>Background</h3><div>eSource technologies that exchange patient data from electronic health records (EHR) to clinical study electronic data capture (EDC) systems can reduce data quality errors and decrease data collection time. However, the availability of site-specific EHR data to support pediatric studies has not been evaluated.</div></div><div><h3>Methods</h3><div>We used a previously developed data element mapping procedure to evaluate the HL7® FHIR® standard’s coverage in multi-center pediatric clinical studies. Four study sites independently mapped three pediatric studies’ case report forms (CRFs) to their site’s EHR and FHIR® server data elements.</div></div><div><h3>Results</h3><div>Site investigators mapped 4152 total and 2070 distinct data elements. Only 33.8 % of total CRF data elements (n = 1402) and 27.4 % of distinct data elements (n = 568) were able to be mapped in FHIR® at the four sites. However, the percent of total data elements mapped varied by pediatric study (55.3 %, 30.8 %, and 26.2 %) and study site (46.4 %, 32.3 %, 27.8 %, and 26.6 %). The percent of total CRF data elements mapped was higher in domains containing standard of care data (e.g., Concomitant Medications, Demographics, Diagnosis/Procedures, Medical History, and Vital Signs) and lower in domains containing protocol-specific data (e.g., Adverse Events, Concomitant Medications, Enrollment/Eligibility/Consent, and study treatment-related Labs, and Vital Signs).</div></div><div><h3>Conclusions</h3><div>There is substantial between-study and between-site variability in the percentage of pediatric study data elements available in FHIR® at study sites. These results suggest that mapping solutions for pediatric studies utilizing eSource technologies will need to be site-specific.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106265"},"PeriodicalIF":4.1,"publicationDate":"2026-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning-driven app for predicting the need for post-operative respiratory support in liver transplant recipients 一个机器学习驱动的应用程序,用于预测肝移植受者术后呼吸支持的需求
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-03 DOI: 10.1016/j.ijmedinf.2026.106263
Ying Wang , Yanan Zhou , Yu Gong , Zhenbin Ding , Liuxiao Yang , Ting Wang

Backgrounds

Liver transplantation (LT) is a life-saving procedure for patients with end-stage liver disease, yet post-operative complications, particularly the need for respiratory support, remain a significant challenge. We aimed to develop and validate a machine learning (ML)-based predictive tool for postoperative respiratory support requirement in liver transplant recipients.

Methods

This single-center retrospective study was conducted at Zhongshan Hospital, Fudan University (Shanghai, China) from January 2018 to October 2023. Following data preprocessing, key variables were selected through univariate analysis, recursive feature elimination (RFE), Chi-square test, and correlation analysis. Nine ML models were initially constructed and optimized via grid search with 5-fold cross-validation. The final model was selected based on area under the curve (AUC), accuracy, sensitivity, specificity, and F1-score, followed by comparative analysis with conventional scoring systems. Model interpretability was achieved using shapley additive explanations (SHAP) analysis, providing both global and local explanations. For clinical implementation, we developed an online application platform for real-time prediction.

Results

The study included 1121 liver transplant recipients, divided into a discovery cohort (n = 749) and validation cohort (n = 372). Significant differences (P < 0.05) were observed between patients requiring versus not requiring respiratory support across multiple preoperative, intraoperative, and postoperative parameters. After hyperparameter optimization, the random forest (RF), stochastic gradient boosting (SGB), and logistic regression (LR) models were applied to the validation cohort, with RF ultimately being selected as the final predictive tool, achieving an AUC of 0.790 (95 % CI: 0.723–0.857) in the test set and 0.713 (95 % CI: 0.658–0.767) in the validation cohort, significantly outperforming both model for end-stage liver disease (MELD) and acute physiology and chronic health evaluation II (APACHE II) scores. SHAP analysis revealed complex bidirectional relationships between predictors and outcomes, with certain variables showing both protective and risk-enhancing effects depending on clinical context.

Conclusions

Based on large-scale clinical data, we developed a robust predictive model that can effectively assess the need for postoperative respiratory support in liver transplant recipients, thereby facilitating clinical decision-making and potentially improving patient outcomes. However, future multi-center validation was warranted to confirm generalizability.
银移植(LT)是终末期肝病患者的救命手术,但术后并发症,特别是呼吸支持的需要,仍然是一个重大挑战。我们旨在开发和验证一种基于机器学习(ML)的预测工具,用于肝移植受者术后呼吸支持需求。方法本研究于2018年1月至2023年10月在中国上海复旦大学中山医院进行。数据预处理后,通过单变量分析、递归特征消除(RFE)、卡方检验和相关分析选择关键变量。通过网格搜索和5倍交叉验证,初步构建了9个ML模型并对其进行了优化。根据曲线下面积(area under curve, AUC)、准确性、敏感性、特异性和f1评分选择最终模型,并与常规评分系统进行对比分析。利用shapley加性解释(SHAP)分析实现了模型的可解释性,提供了全局和局部解释。为了临床实施,我们开发了一个实时预测的在线应用平台。结果本研究纳入1121例肝移植受者,分为发现组(n = 749)和验证组(n = 372)。在术前、术中和术后的多项参数中,需要呼吸支持的患者与不需要呼吸支持的患者之间存在显著差异(P < 0.05)。在超参数优化后,将随机森林(RF)、随机梯度增强(SGB)和逻辑回归(LR)模型应用于验证队列,最终选择RF作为最终预测工具,在测试集中实现0.790 (95% CI: 0.723-0.857)和0.713 (95% CI: 0.713)的AUC。0.658-0.767),显著优于终末期肝病(MELD)模型和急性生理和慢性健康评估II (APACHE II)评分。SHAP分析揭示了预测因素和结果之间复杂的双向关系,根据临床情况,某些变量显示出保护和增加风险的作用。结论基于大量临床数据,我们建立了一个稳健的预测模型,可以有效评估肝移植受者术后呼吸支持的需求,从而促进临床决策,并可能改善患者的预后。然而,未来的多中心验证是必要的,以确认普遍性。
{"title":"A machine learning-driven app for predicting the need for post-operative respiratory support in liver transplant recipients","authors":"Ying Wang ,&nbsp;Yanan Zhou ,&nbsp;Yu Gong ,&nbsp;Zhenbin Ding ,&nbsp;Liuxiao Yang ,&nbsp;Ting Wang","doi":"10.1016/j.ijmedinf.2026.106263","DOIUrl":"10.1016/j.ijmedinf.2026.106263","url":null,"abstract":"<div><h3>Backgrounds</h3><div>Liver transplantation (LT) is a life-saving procedure for patients with end-stage liver disease, yet post-operative complications, particularly the need for respiratory support, remain a significant challenge. We aimed to develop and validate a machine learning (ML)-based predictive tool for postoperative respiratory support requirement in liver transplant recipients.</div></div><div><h3>Methods</h3><div>This single-center retrospective study was conducted at Zhongshan Hospital, Fudan University (Shanghai, China) from January 2018 to October 2023. Following data preprocessing, key variables were selected through univariate analysis, recursive feature elimination (RFE), Chi-square test, and correlation analysis. Nine ML models were initially constructed and optimized via grid search with 5-fold cross-validation. The final model was selected based on area under the curve (AUC), accuracy, sensitivity, specificity, and F1-score, followed by comparative analysis with conventional scoring systems. Model interpretability was achieved using shapley additive explanations (SHAP) analysis, providing both global and local explanations. For clinical implementation, we developed an online application platform for real-time prediction.</div></div><div><h3>Results</h3><div>The study included 1121 liver transplant recipients, divided into a discovery cohort (n = 749) and validation cohort (n = 372). Significant differences (P &lt; 0.05) were observed between patients requiring versus not requiring respiratory support across multiple preoperative, intraoperative, and postoperative parameters. After hyperparameter optimization, the random forest (RF), stochastic gradient boosting (SGB), and logistic regression (LR) models were applied to the validation cohort, with RF ultimately being selected as the final predictive tool, achieving an AUC of 0.790 (95 % CI: 0.723–0.857) in the test set and 0.713 (95 % CI: 0.658–0.767) in the validation cohort, significantly outperforming both model for end-stage liver disease (MELD) and acute physiology and chronic health evaluation II (APACHE II) scores. SHAP analysis revealed complex bidirectional relationships between predictors and outcomes, with certain variables showing both protective and risk-enhancing effects depending on clinical context.</div></div><div><h3>Conclusions</h3><div>Based on large-scale clinical data, we developed a robust predictive model that can effectively assess the need for postoperative respiratory support in liver transplant recipients, thereby facilitating clinical decision-making and potentially improving patient outcomes. However, future multi-center validation was warranted to confirm generalizability.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106263"},"PeriodicalIF":4.1,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advantages and challenges of tracking st-segment elevation myocardial infarction patients with a real-time dashboard: A single-centre experience 用实时仪表板跟踪st段抬高型心肌梗死患者的优势和挑战:单中心体验
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-02 DOI: 10.1016/j.ijmedinf.2026.106261
Egidio de Mattia , Filippo Paoletti , Daniela Pedicino , Giovanna Liuzzo , Carmen Angioletti , Alessia d’Aiello , Alessio Perilli , Andrea Adduci , Giovanni Arcuri , Emilio Meneschincheri , Barbara Ruffo , Melissa D’Agostino , Rita De Donno , Antonio Giulio de Belvis

Background

Timely primary percutaneous coronary intervention (pPCI) is the most important treatment to improve outcomes in ST-segment elevation myocardial infarction (STEMI), with a strong relationship between treatment delays and morbidity and mortality. The present study aims to define the main steps for setting up a real-time digital monitoring dashboard to improve the clinical performance of STEMI management and to evaluate the impact of its implementation on the proportion of patients receiving primary percutaneous coronary intervention (pPCI) within 90 min.

Methods

The set-up of the digital monitoring system required the definition of detailed algorithms for the diagnosis, treatment, and rehab/follow-up phase. For each patient with a diagnosis of STEMI included in the clinical pathway (CP) a multidisciplinary working group identified i) rules for flagging patients alongside the CP, based on specific risk scores; ii) the critical points of the CP to be monitored, such as door-to-balloon time, intensive care unit length of stay, and total hospital length of stay. An interrupted time series analysis and multivariable logistic regression models were performed to assess for changes in the outcome (pPCI within 90 min) after the platform implementation, adjusting for temporal and individual confounders.

Results

After the introduction of the dashboard, the proportion of timely pPCI improved from 40 % pre-implementation to 65 % post-implementation. Adjusted models indicated a twofold increase in the odds of meeting the 90-minute benchmark (OR = 2.00; 95 % CI: 0.99–4.12).

Conclusion

The real-time monitoring system showed a positive impact on the timely management of STEMI, highlighting the potential for improving healthcare efficiency and patient outcomes.
及时的原发性经皮冠状动脉介入治疗(pPCI)是改善st段抬高型心肌梗死(STEMI)预后最重要的治疗方法,治疗延误与发病率和死亡率之间存在密切关系。本研究旨在定义建立实时数字监测仪表板的主要步骤,以提高STEMI管理的临床表现,并评估其实施对90分钟内接受初级经皮冠状动脉介入治疗(pPCI)的患者比例的影响。方法数字监测系统的建立需要明确诊断、治疗和康复/随访阶段的详细算法。对于临床路径(CP)中每个STEMI诊断患者,多学科工作组确定i)根据特定风险评分将患者与CP一起标记的规则;ii)需要监测的CP关键点,如从门到球囊的时间、重症监护病房的住院时间和总住院时间。通过中断时间序列分析和多变量逻辑回归模型来评估平台实施后结果(90分钟内pPCI)的变化,并对时间和个体混杂因素进行调整。结果引入仪表板后,及时pPCI的比例由实施前的40%提高到实施后的65%。调整后的模型显示,达到90分钟基准的几率增加了两倍(OR = 2.00; 95% CI: 0.99-4.12)。结论实时监测系统对STEMI的及时管理有积极的影响,突出了提高医疗效率和患者预后的潜力。
{"title":"Advantages and challenges of tracking st-segment elevation myocardial infarction patients with a real-time dashboard: A single-centre experience","authors":"Egidio de Mattia ,&nbsp;Filippo Paoletti ,&nbsp;Daniela Pedicino ,&nbsp;Giovanna Liuzzo ,&nbsp;Carmen Angioletti ,&nbsp;Alessia d’Aiello ,&nbsp;Alessio Perilli ,&nbsp;Andrea Adduci ,&nbsp;Giovanni Arcuri ,&nbsp;Emilio Meneschincheri ,&nbsp;Barbara Ruffo ,&nbsp;Melissa D’Agostino ,&nbsp;Rita De Donno ,&nbsp;Antonio Giulio de Belvis","doi":"10.1016/j.ijmedinf.2026.106261","DOIUrl":"10.1016/j.ijmedinf.2026.106261","url":null,"abstract":"<div><h3>Background</h3><div>Timely primary percutaneous coronary intervention (pPCI) is the most important treatment to improve outcomes in ST-segment elevation myocardial infarction (STEMI), with a strong relationship between treatment delays and morbidity and mortality. The present study aims to define the main steps for setting up a real-time digital monitoring dashboard to improve the clinical performance of STEMI management and to evaluate the impact of its implementation on the proportion of patients receiving primary percutaneous coronary intervention (pPCI) within 90 min.</div></div><div><h3>Methods</h3><div>The set-up of the digital monitoring system required the definition of detailed algorithms for the diagnosis, treatment, and rehab/follow-up phase. For each patient with a diagnosis of STEMI included in the clinical pathway (CP) a multidisciplinary working group identified i) rules for flagging patients alongside the CP, based on specific risk scores; ii) the critical points of the CP to be monitored, such as door-to-balloon time, intensive care unit length of stay, and total hospital length of stay. An interrupted time series analysis and multivariable logistic regression models were performed to assess for changes in the outcome (pPCI within 90 min) after the platform implementation, adjusting for temporal and individual confounders.</div></div><div><h3>Results</h3><div>After the introduction of the dashboard, the proportion of timely pPCI improved from 40 % pre-implementation to 65 % post-implementation. Adjusted models indicated a twofold increase in the odds of meeting the 90-minute benchmark (OR = 2.00; 95 % CI: 0.99–4.12).</div></div><div><h3>Conclusion</h3><div>The real-time monitoring system showed a positive impact on the timely management of STEMI, highlighting the potential for improving healthcare efficiency and patient outcomes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106261"},"PeriodicalIF":4.1,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scoping review: how evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare 范围审查:评估方法如何影响我们对ChatGPT在医疗保健中的有效性的理解
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-31 DOI: 10.1016/j.ijmedinf.2025.106248
Yuanyuan Liu , Yu Zhang, Haoran Mao

Background

The rapid growth in research on ChatGPT’s healthcare applications has led to diverse evaluation methods and substantially heterogeneous findings, undermining evidence reliability and hindering clinical translation.

Objectives

This review aims to examine how different evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare.

Methods

Studies published between 2023 and 2024 that assess the use of ChatGPT in medical or healthcare-related contexts were included. Evidence was obtained from peer-reviewed literature analyzing ChatGPT’s applications across clinical, educational, and diagnostic domains. Following the PRISMA guidelines, this systematic review analyzed 131 studies published during 2023–2024 that assess the use of ChatGPT in medical contexts.

Results

The results indicate that predominant evaluation approaches—controlled trial studies, expert assessment studies, measurement-based evaluation studies, and prompt generation analysis studies—systematically influence conclusions about ChatGPT’s performance due to their inherent methodological characteristics, such as subjectivity, objectivity, and differences in ecological validity. Further analysis reveals that ChatGPT’s performance is highly context-dependent, shaped by specific application scenarios, model versions, and prompting strategies.

Conclusions

To address methodological heterogeneity and the lack of standardization, this study recommends multi-method cross-validation strategies and a risk-stratified, standardized evaluation framework. These steps are essential to enhance the scientific rigor and reliability of ChatGPT’s assessment in healthcare and to provide a solid foundation for its clinical integration.
ChatGPT在医疗保健应用方面的研究快速增长,导致评估方法多样化,结果极不一致,破坏了证据的可靠性,阻碍了临床转化。目的本综述旨在研究不同的评估方法如何影响我们对ChatGPT在医疗保健中的有效性的理解。方法纳入2023年至2024年间发表的评估ChatGPT在医疗或卫生保健相关背景下使用的研究。证据来自同行评议的文献,分析了ChatGPT在临床、教育和诊断领域的应用。遵循PRISMA指南,本系统综述分析了2023-2024年间发表的131项评估ChatGPT在医学背景下使用的研究。结果表明,主要的评估方法——对照试验研究、专家评估研究、基于测量的评估研究和提示生成分析研究——由于其固有的方法学特征(如主观性、客观性和生态效度差异),系统地影响了关于ChatGPT性能的结论。进一步的分析表明,ChatGPT的性能高度依赖于上下文,由特定的应用程序场景、模型版本和提示策略决定。结论为了解决方法异质性和缺乏标准化的问题,本研究建议采用多方法交叉验证策略和风险分层、标准化的评估框架。这些步骤对于提高ChatGPT在医疗保健领域评估的科学严谨性和可靠性至关重要,并为其临床整合提供坚实的基础。
{"title":"A scoping review: how evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare","authors":"Yuanyuan Liu ,&nbsp;Yu Zhang,&nbsp;Haoran Mao","doi":"10.1016/j.ijmedinf.2025.106248","DOIUrl":"10.1016/j.ijmedinf.2025.106248","url":null,"abstract":"<div><h3>Background</h3><div>The rapid growth in research on ChatGPT’s healthcare applications has led to diverse evaluation methods and substantially heterogeneous findings, undermining evidence reliability and hindering clinical translation.</div></div><div><h3>Objectives</h3><div>This review aims to examine how different evaluation methods shape our understanding of ChatGPT’s effectiveness in healthcare.</div></div><div><h3>Methods</h3><div>Studies published between 2023 and 2024 that assess the use of ChatGPT in medical or healthcare-related contexts were included. Evidence was obtained from peer-reviewed literature analyzing ChatGPT’s applications across clinical, educational, and diagnostic domains. Following the PRISMA guidelines, this systematic review analyzed 131 studies published during 2023–2024 that assess the use of ChatGPT in medical contexts.</div></div><div><h3>Results</h3><div>The results indicate that predominant evaluation approaches—controlled trial studies, expert assessment studies, measurement-based evaluation studies, and prompt generation analysis studies—systematically influence conclusions about ChatGPT’s performance due to their inherent methodological characteristics, such as subjectivity, objectivity, and differences in ecological validity. Further analysis reveals that ChatGPT’s performance is highly context-dependent, shaped by specific application scenarios, model versions, and prompting strategies.</div></div><div><h3>Conclusions</h3><div>To address methodological heterogeneity and the lack of standardization, this study recommends multi-method cross-validation strategies and a risk-stratified, standardized evaluation framework. These steps are essential to enhance the scientific rigor and reliability of ChatGPT’s assessment in healthcare and to provide a solid foundation for its clinical integration.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106248"},"PeriodicalIF":4.1,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models versus healthcare professionals in providing medical information to patient questions: A systematic review 大型语言模型与医疗保健专业人员在为患者问题提供医疗信息方面的比较:系统回顾。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-31 DOI: 10.1016/j.ijmedinf.2025.106250
Maud M.G. Jacobs , Jacobien H.F. Oosterhoff , Rintje Agricola , Walter van der Weegen

Objective

The rapid expansion of digital healthcare has heightened the volume of patient communication, thereby increasing the workload for healthcare professionals. Large Language Models (LLMs) hold promises for offering automated responses to patient questions relayed through eHealth platforms, yet concerns persist regarding their effectiveness, accuracy, and limitations in healthcare settings. This study aims to evaluate the current evidence on the performance and perceived suitability of LLMs in healthcare, focusing on their role in supporting clinical decision-making and patient communication.

Materials and methods

A systematic search in PubMed and Embase up to June 11, 2025 identified 330 studies, of which 20 met the inclusion criteria for comparing the accuracy and adequacy of medical information provided by LLMs versus healthcare professionals and guidelines. The search strategy combined terms related to LLMs, healthcare professionals, and patient questions. The ROBINS-I tool assessed the risk of bias.

Results

A total of nineteen studies focused on medical specialties and one on the primary care setting. Twelve studies favored the responses generated by LLMs, six reported mixed results, and two favored the healthcare professionals’ response. Bias components generally scored moderate to low, indicating a low risk of bias.

Discussion and conclusions

The review summarizes current evidence on the accuracy and adequacy of medical information provided by LLMs in response to patient questions, compared to healthcare professionals and clinical guidelines. While LLMs show potential as supportive tools in healthcare, their integration should be approached cautiously due to inconsistent performance and possible risks. Further research is essential before widespread adoption.
目的:数字医疗的快速发展增加了患者的沟通量,从而增加了医疗保健专业人员的工作量。大型语言模型(llm)有望为通过电子健康平台转发的患者问题提供自动响应,但在医疗保健环境中,它们的有效性、准确性和局限性仍然令人担忧。本研究旨在评估法学硕士在医疗保健中的表现和感知适应性的现有证据,重点关注他们在支持临床决策和患者沟通方面的作用。材料和方法:截至2025年6月11日,在PubMed和Embase中进行了系统检索,确定了330项研究,其中20项符合比较法学硕士与医疗保健专业人员和指南提供的医学信息的准确性和充分性的纳入标准。搜索策略结合了与法学硕士、医疗保健专业人员和患者问题相关的术语。ROBINS-I工具评估偏倚风险。结果:共有19项研究集中在医学专业,1项研究集中在初级保健环境。12项研究支持法学硕士产生的反应,6项报告混合结果,2项支持医疗保健专业人员的反应。偏倚成分一般得分为中低,表明偏倚风险较低。讨论和结论:本综述总结了与医疗保健专业人员和临床指南相比,法学硕士在回答患者问题时提供的医学信息的准确性和充分性的现有证据。虽然法学硕士在医疗保健领域显示出作为辅助工具的潜力,但由于性能不一致和可能存在的风险,应谨慎对待它们的整合。在广泛采用之前,有必要进行进一步的研究。
{"title":"Large language models versus healthcare professionals in providing medical information to patient questions: A systematic review","authors":"Maud M.G. Jacobs ,&nbsp;Jacobien H.F. Oosterhoff ,&nbsp;Rintje Agricola ,&nbsp;Walter van der Weegen","doi":"10.1016/j.ijmedinf.2025.106250","DOIUrl":"10.1016/j.ijmedinf.2025.106250","url":null,"abstract":"<div><h3>Objective</h3><div>The rapid expansion of digital healthcare has heightened the volume of patient communication, thereby increasing the workload for healthcare professionals. Large Language Models (LLMs) hold promises for offering automated responses to patient questions relayed through eHealth platforms, yet concerns persist regarding their effectiveness, accuracy, and limitations in healthcare settings. This study aims to evaluate the current evidence on the performance and perceived suitability of LLMs in healthcare, focusing on their role in supporting clinical decision-making and patient communication.</div></div><div><h3>Materials and methods</h3><div>A systematic search in PubMed and Embase up to June 11, 2025 identified 330 studies, of which 20 met the inclusion criteria for comparing the accuracy and adequacy of medical information provided by LLMs versus healthcare professionals and guidelines. The search strategy combined terms related to LLMs, healthcare professionals, and patient questions. The ROBINS-I tool assessed the risk of bias.</div></div><div><h3>Results</h3><div>A total of nineteen studies focused on medical specialties and one on the primary care setting. Twelve studies favored the responses generated by LLMs, six reported mixed results, and two favored the healthcare professionals’ response. Bias components generally scored moderate to low, indicating a low risk of bias.</div></div><div><h3>Discussion and conclusions</h3><div>The review summarizes current evidence on the accuracy and adequacy of medical information provided by LLMs in response to patient questions, compared to healthcare professionals and clinical guidelines. While LLMs show potential as supportive tools in healthcare, their integration should be approached cautiously due to inconsistent performance and possible risks. Further research is essential before widespread adoption.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106250"},"PeriodicalIF":4.1,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging large language models to automate the identification of healthcare access barriers for veterans 利用大型语言模型自动识别退伍军人的医疗保健访问障碍。
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-30 DOI: 10.1016/j.ijmedinf.2025.106247
Sudarshan Srinivasan , Caitlin Rizy , Maria Mahbub , David Bolme , Alina Peluso , Jodie Trafton , Ioana Danciu

Objective

To develop and evaluate an automated system for identifying healthcare barriers focusing on transportation issues in veterans’ clinical notes using large language models (LLMs) and to assess the impact of different prompting strategies on classification performance and explanation consistency.

Methods

We developed a hybrid system combining pattern matching for templated notes with LLM analysis for free-text notes. Using 2000 manually annotated clinical notes, we compared four prompting strategies (dual-role short, dual-role long, analysis-first, analysis-only) across Mistral-7B and Llama-3.1 models. We evaluated classification performance using standard metrics and assessed explanation consistency through embedding similarity analysis.

Results

The analysis-first strategy achieved superior performance, with Mistral-7B reaching an F1 score of 0.914, outperforming traditional machine learning approaches (GBM: 0.786, BERT: 0.811). LLMs demonstrated higher explanation consistency within models (mean cosine similarity 0.887–0.908) compared to cross-model similarities (0.767–0.872). Pattern matching successfully handled 6.7% of templated notes deterministically. Mistral-7B showed greater internal consistency but higher abstention rates compared to Llama-3.1.

Conclusion

Requiring LLMs to analyze evidence before classification improves both accuracy and explanation consistency for identifying transportation barriers in clinical notes. This approach enables automated barrier detection at scale while providing clinically relevant explanations, supporting both population-level healthcare planning and individual patient care decisions.
目的:利用大语言模型(large language models, LLMs)开发和评估一套自动识别退伍军人临床记录中交通问题医疗障碍的系统,并评估不同提示策略对分类性能和解释一致性的影响。方法:开发了模板笔记模式匹配与自由文本笔记LLM分析相结合的混合系统。使用2000份人工注释的临床记录,我们比较了Mistral-7B和lama-3.1模型的四种提示策略(双角色短、双角色长、分析优先、仅分析)。我们使用标准指标评估分类性能,并通过嵌入相似度分析评估解释一致性。结果:分析优先策略取得了优异的性能,Mistral-7B达到了0.914的F1分数,优于传统的机器学习方法(GBM: 0.786, BERT: 0.811)。与跨模型相似性(0.767-0.872)相比,llm在模型内表现出更高的解释一致性(平均余弦相似性0.887-0.908)。模式匹配成功地确定地处理了6.7%的模板注释。与羊驼-3.1相比,Mistral-7B表现出更大的内部一致性,但更高的弃权率。结论:要求llm在分类前分析证据,提高了临床记录中运输障碍识别的准确性和解释的一致性。这种方法可以实现大规模的自动屏障检测,同时提供临床相关的解释,支持人群层面的医疗保健计划和个体患者护理决策。
{"title":"Leveraging large language models to automate the identification of healthcare access barriers for veterans","authors":"Sudarshan Srinivasan ,&nbsp;Caitlin Rizy ,&nbsp;Maria Mahbub ,&nbsp;David Bolme ,&nbsp;Alina Peluso ,&nbsp;Jodie Trafton ,&nbsp;Ioana Danciu","doi":"10.1016/j.ijmedinf.2025.106247","DOIUrl":"10.1016/j.ijmedinf.2025.106247","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and evaluate an automated system for identifying healthcare barriers focusing on transportation issues in veterans’ clinical notes using large language models (LLMs) and to assess the impact of different prompting strategies on classification performance and explanation consistency.</div></div><div><h3>Methods</h3><div>We developed a hybrid system combining pattern matching for templated notes with LLM analysis for free-text notes. Using 2000 manually annotated clinical notes, we compared four prompting strategies (dual-role short, dual-role long, analysis-first, analysis-only) across Mistral-7B and Llama-3.1 models. We evaluated classification performance using standard metrics and assessed explanation consistency through embedding similarity analysis.</div></div><div><h3>Results</h3><div>The analysis-first strategy achieved superior performance, with Mistral-7B reaching an F1 score of 0.914, outperforming traditional machine learning approaches (GBM: 0.786, BERT: 0.811). LLMs demonstrated higher explanation consistency within models (mean cosine similarity 0.887–0.908) compared to cross-model similarities (0.767–0.872). Pattern matching successfully handled 6.7% of templated notes deterministically. Mistral-7B showed greater internal consistency but higher abstention rates compared to Llama-3.1.</div></div><div><h3>Conclusion</h3><div>Requiring LLMs to analyze evidence before classification improves both accuracy and explanation consistency for identifying transportation barriers in clinical notes. This approach enables automated barrier detection at scale while providing clinically relevant explanations, supporting both population-level healthcare planning and individual patient care decisions.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106247"},"PeriodicalIF":4.1,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey on cancer patients’ attitudes towards AI and data protection: A cross-sectional study from an Italian cancer center 癌症患者对人工智能和数据保护的态度调查:来自意大利癌症中心的横断面研究
IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-30 DOI: 10.1016/j.ijmedinf.2025.106237
Martina Cavallucci , Alice Andalò , Valentina Danesi , Nicola Gentili , Ilaria Massa , Emanuela Scarpi , Maria Chiara Restuccia , Roberto Vespignani , Alice Conficconi , Michela Palleschi , Ugo De Giorgi , Antonino Musolino , Filippo Merloni
Background
Artificial Intelligence (AI) is increasingly integrated into oncology, offering opportunities to improve diagnostics, treatment planning, and operational efficiency. However, patient perspectives on AI, especially regarding data protection and ethical implications, remain underexplored.
Objective
The objective of this study is to investigate cancer patients’ attitudes toward the use of Artificial Intelligence (AI) in healthcare, focusing on their awareness of data protection, perceived risks and benefits, and the conditions under which AI is considered acceptable. Additionally, the study aims to examine how demographic and educational factors influence patients’ views within the context of an Italian comprehensive cancer center.
Methods
A cross-sectional survey was conducted with 117 cancer patients who completed a 28-item online questionnaire. The survey evaluated levels of AI knowledge, perceptions of data privacy, concerns about AI in medical contexts, and willingness to share health data for research.
Results
Most participants demonstrated moderate awareness of AI (70.1%) and its medical applications (85.5%), with higher familiarity observed among younger and more educated individuals. While data protection understanding varied, 76.9% were willing to share personal health data for research aimed at improving cancer care. Concerns included reduced physician autonomy (52.1%) and diminished physician-patient interaction (63.3%). However, 82.9% of respondents found AI acceptable when clinical decisions remained under physician control. AI was most favorably viewed for administrative support and care process optimization.
Conclusion
Cancer patients generally view AI in healthcare positively, especially when it maintains physician oversight and safeguards data privacy. To ensure equitable and informed adoption, targeted educational initiatives and transparent communication strategies should address generational, educational, and digital literacy differences.
人工智能(AI)越来越多地融入肿瘤学,为改善诊断、治疗计划和运营效率提供了机会。然而,患者对人工智能的看法,特别是在数据保护和伦理影响方面,仍未得到充分探讨。本研究的目的是调查癌症患者对在医疗保健中使用人工智能(AI)的态度,重点关注他们对数据保护的意识、感知的风险和收益,以及在什么情况下人工智能被认为是可接受的。此外,该研究旨在研究人口统计学和教育因素如何影响意大利综合癌症中心内患者的观点。方法对117例癌症患者进行横断面调查。该调查评估了人工智能的知识水平、对数据隐私的看法、对医疗环境中人工智能的担忧以及共享健康数据用于研究的意愿。结果大多数参与者对人工智能(70.1%)及其医学应用(85.5%)表现出中等程度的认识,年轻和受教育程度较高的个体对人工智能的熟悉程度较高。虽然对数据保护的理解各不相同,但76.9%的人愿意分享个人健康数据,用于旨在改善癌症治疗的研究。担忧包括医师自主性降低(52.1%)和医患互动减少(63.3%)。然而,82.9%的受访者认为,当临床决策仍由医生控制时,人工智能是可以接受的。人工智能在行政支持和护理流程优化方面最受欢迎。结论癌症患者普遍对人工智能在医疗保健中的应用持积极态度,尤其是在维护医生监督和保护数据隐私方面。为了确保公平和知情的收养,有针对性的教育举措和透明的沟通战略应解决代际、教育和数字素养的差异。
{"title":"Survey on cancer patients’ attitudes towards AI and data protection: A cross-sectional study from an Italian cancer center","authors":"Martina Cavallucci ,&nbsp;Alice Andalò ,&nbsp;Valentina Danesi ,&nbsp;Nicola Gentili ,&nbsp;Ilaria Massa ,&nbsp;Emanuela Scarpi ,&nbsp;Maria Chiara Restuccia ,&nbsp;Roberto Vespignani ,&nbsp;Alice Conficconi ,&nbsp;Michela Palleschi ,&nbsp;Ugo De Giorgi ,&nbsp;Antonino Musolino ,&nbsp;Filippo Merloni","doi":"10.1016/j.ijmedinf.2025.106237","DOIUrl":"10.1016/j.ijmedinf.2025.106237","url":null,"abstract":"<div><div><strong>Background</strong></div><div>Artificial Intelligence (AI) is increasingly integrated into oncology, offering opportunities to improve diagnostics, treatment planning, and operational efficiency. However, patient perspectives on AI, especially regarding data protection and ethical implications, remain underexplored.</div><div><strong>Objective</strong></div><div>The objective of this study is to investigate cancer patients’ attitudes toward the use of Artificial Intelligence (AI) in healthcare, focusing on their awareness of data protection, perceived risks and benefits, and the conditions under which AI is considered acceptable. Additionally, the study aims to examine how demographic and educational factors influence patients’ views within the context of an Italian comprehensive cancer center.</div><div><strong>Methods</strong></div><div>A cross-sectional survey was conducted with 117 cancer patients who completed a 28-item online questionnaire. The survey evaluated levels of AI knowledge, perceptions of data privacy, concerns about AI in medical contexts, and willingness to share health data for research.</div><div><strong>Results</strong></div><div>Most participants demonstrated moderate awareness of AI (70.1%) and its medical applications (85.5%), with higher familiarity observed among younger and more educated individuals. While data protection understanding varied, 76.9% were willing to share personal health data for research aimed at improving cancer care. Concerns included reduced physician autonomy (52.1%) and diminished physician-patient interaction (63.3%). However, 82.9% of respondents found AI acceptable when clinical decisions remained under physician control. AI was most favorably viewed for administrative support and care process optimization.</div><div><strong>Conclusion</strong></div><div>Cancer patients generally view AI in healthcare positively, especially when it maintains physician oversight and safeguards data privacy. To ensure equitable and informed adoption, targeted educational initiatives and transparent communication strategies should address generational, educational, and digital literacy differences.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"209 ","pages":"Article 106237"},"PeriodicalIF":4.1,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1