首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
A SNAPpy use of large language models: using large language models to classify treatment plans in pediatric acute otitis media. 快速使用大型语言模型:使用大型语言模型对儿童急性中耳炎的治疗方案进行分类。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf170
Jessica J Pourian, Ben Michaels, Anh Vo, A Jay Holmgren, Augusto Garcia-Agundez, Valerie Flaherman

Background and significance: Acute otitis media (AOM) is a leading cause of pediatric antibiotic overuse. Safety Net Antibiotic Prescriptions (SNAPs) are recommended for antibiotic stewardship but are difficult to identify due to lack of structured documentation.

Objective: This study validates the accuracy of Versa, a GPT-4o based HIPAA-compliant large language model (LLM), to classify AOM treatment plans from physician notes.

Methods: A retrospective cross-sectional study analyzed pediatric AOM encounters. Multiple prompting strategies were used to classify treatment plans and validated against a representative sample of manual reviews by 2 pediatricians. A locally fine-tuned model, Clinical-Longformer was also trained and tested against Versa and human review.

Results: In total, 5707 encounters were included; 374 reviewed manually. Zero-shot accuracy was 97.8%; few-shot accuracy was 85%. Clinical-Longformer achieved 93.3% accuracy.

Conclusion: Versa effectively identifies AOM treatment plans, providing a cost-efficient quality improvement tracking tool for prescription practice patterns in pediatric antibiotic stewardship efforts.

背景与意义:急性中耳炎(AOM)是儿童抗生素过度使用的主要原因。安全网抗生素处方(SNAPs)被推荐用于抗生素管理,但由于缺乏结构化文件而难以识别。目的:验证基于gpt - 40的符合hipaa的大语言模型Versa (LLM)对AOM治疗方案进行分类的准确性。方法:回顾性横断面研究分析儿科急性中耳炎。使用多种提示策略对治疗方案进行分类,并对2名儿科医生的人工评价的代表性样本进行验证。clini - longformer是一种局部微调模型,也接受了Versa和人体检查的训练和测试。结果:共纳入5707次就诊;374条手工审阅。零射击精度97.8%;少发精度为85%。Clinical-Longformer准确率达到93.3%。结论:Versa可有效识别AOM治疗方案,为儿科抗生素管理工作中的处方实践模式提供具有成本效益的质量改进跟踪工具。
{"title":"A SNAPpy use of large language models: using large language models to classify treatment plans in pediatric acute otitis media.","authors":"Jessica J Pourian, Ben Michaels, Anh Vo, A Jay Holmgren, Augusto Garcia-Agundez, Valerie Flaherman","doi":"10.1093/jamia/ocaf170","DOIUrl":"10.1093/jamia/ocaf170","url":null,"abstract":"<p><strong>Background and significance: </strong>Acute otitis media (AOM) is a leading cause of pediatric antibiotic overuse. Safety Net Antibiotic Prescriptions (SNAPs) are recommended for antibiotic stewardship but are difficult to identify due to lack of structured documentation.</p><p><strong>Objective: </strong>This study validates the accuracy of Versa, a GPT-4o based HIPAA-compliant large language model (LLM), to classify AOM treatment plans from physician notes.</p><p><strong>Methods: </strong>A retrospective cross-sectional study analyzed pediatric AOM encounters. Multiple prompting strategies were used to classify treatment plans and validated against a representative sample of manual reviews by 2 pediatricians. A locally fine-tuned model, Clinical-Longformer was also trained and tested against Versa and human review.</p><p><strong>Results: </strong>In total, 5707 encounters were included; 374 reviewed manually. Zero-shot accuracy was 97.8%; few-shot accuracy was 85%. Clinical-Longformer achieved 93.3% accuracy.</p><p><strong>Conclusion: </strong>Versa effectively identifies AOM treatment plans, providing a cost-efficient quality improvement tracking tool for prescription practice patterns in pediatric antibiotic stewardship efforts.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1947-1951"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646383/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable framework for benchmark embedding models in semantic health-care tasks. 用于语义医疗保健任务中基准嵌入模型的可扩展框架。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf149
Shelly Soffer, Mahmud Omar, Moran Gendler, Benjamin S Glicksberg, Patricia Kovatch, Orly Efros, Robert Freeman, Alexander W Charney, Girish N Nadkarni, Eyal Klang

Objectives: Text embeddings are promising for semantic tasks, such as retrieval augmented generation (RAG). However, their application in health care is underexplored due to a lack of benchmarking methods. We introduce a scalable benchmarking method to test embeddings for health-care semantic tasks.

Materials and methods: We evaluated 39 embedding models across 7 medical semantic similarity tasks using diverse datasets. These datasets comprised real-world patient data (from the Mount Sinai Health System and MIMIC IV), biomedical texts from PubMed, and synthetic data generated with Llama-3-70b. We first assessed semantic textual similarity (STS) by correlating the model-generated similarity scores with noise levels using Spearman rank correlation. We then reframed the same tasks as retrieval problems, evaluated by mean reciprocal rank and recall at k.

Results: In total, evaluating 2000 text pairs per 7 tasks for STS and retrieval yielded 3.28 million model assessments. Larger models (>7b parameters), such as those based on Mistral-7b and Gemma-2-9b, consistently performed well, especially in long-context tasks. The NV-Embed-v1 model (7b parameters), although top in short tasks, underperformed in long tasks. For short tasks, smaller models such as b1ade-embed (335M parameters) performed on-par to the larger models. For long retrieval tasks, the larger models significantly outperformed the smaller ones.

Discussion: The proposed benchmarking framework demonstrates scalability and flexibility, offering a structured approach to guide the selection of embedding models for a wide range of health-care tasks.

Conclusion: By matching the appropriate model with the task, the framework enables more effective deployment of embedding models, enhancing critical applications such as semantic search and retrieval-augmented generation (RAG).

目的:文本嵌入在语义任务中很有前途,例如检索增强生成(RAG)。然而,由于缺乏基准方法,它们在医疗保健中的应用尚未得到充分探索。我们引入了一种可扩展的基准测试方法来测试医疗保健语义任务的嵌入。材料和方法:我们使用不同的数据集评估了7个医学语义相似度任务中的39个嵌入模型。这些数据集包括真实世界的患者数据(来自Mount Sinai Health System和MIMIC IV)、PubMed的生物医学文献以及Llama-3-70b生成的合成数据。我们首先通过使用Spearman秩相关将模型生成的相似性得分与噪声水平相关联来评估语义文本相似性(STS)。然后,我们将相同的任务重新定义为检索问题,通过k的平均对等等级和召回率进行评估。结果:在STS和检索的7个任务中评估2000个文本对总共产生了328万个模型评估。较大的模型(bbb7b参数),如基于Mistral-7b和Gemma-2-9b的模型,一直表现良好,特别是在长上下文任务中。NV-Embed-v1模型(7b个参数)虽然在短任务中表现最好,但在长任务中表现不佳。对于较短的任务,较小的模型(如blade -embed (335M参数))的执行与较大的模型相当。对于较长的检索任务,较大的模型明显优于较小的模型。讨论:拟议的基准框架展示了可伸缩性和灵活性,提供了一种结构化的方法来指导为广泛的医疗保健任务选择嵌入模型。结论:通过将适当的模型与任务相匹配,该框架能够更有效地部署嵌入模型,增强关键应用,如语义搜索和检索增强生成(RAG)。
{"title":"A scalable framework for benchmark embedding models in semantic health-care tasks.","authors":"Shelly Soffer, Mahmud Omar, Moran Gendler, Benjamin S Glicksberg, Patricia Kovatch, Orly Efros, Robert Freeman, Alexander W Charney, Girish N Nadkarni, Eyal Klang","doi":"10.1093/jamia/ocaf149","DOIUrl":"10.1093/jamia/ocaf149","url":null,"abstract":"<p><strong>Objectives: </strong>Text embeddings are promising for semantic tasks, such as retrieval augmented generation (RAG). However, their application in health care is underexplored due to a lack of benchmarking methods. We introduce a scalable benchmarking method to test embeddings for health-care semantic tasks.</p><p><strong>Materials and methods: </strong>We evaluated 39 embedding models across 7 medical semantic similarity tasks using diverse datasets. These datasets comprised real-world patient data (from the Mount Sinai Health System and MIMIC IV), biomedical texts from PubMed, and synthetic data generated with Llama-3-70b. We first assessed semantic textual similarity (STS) by correlating the model-generated similarity scores with noise levels using Spearman rank correlation. We then reframed the same tasks as retrieval problems, evaluated by mean reciprocal rank and recall at k.</p><p><strong>Results: </strong>In total, evaluating 2000 text pairs per 7 tasks for STS and retrieval yielded 3.28 million model assessments. Larger models (>7b parameters), such as those based on Mistral-7b and Gemma-2-9b, consistently performed well, especially in long-context tasks. The NV-Embed-v1 model (7b parameters), although top in short tasks, underperformed in long tasks. For short tasks, smaller models such as b1ade-embed (335M parameters) performed on-par to the larger models. For long retrieval tasks, the larger models significantly outperformed the smaller ones.</p><p><strong>Discussion: </strong>The proposed benchmarking framework demonstrates scalability and flexibility, offering a structured approach to guide the selection of embedding models for a wide range of health-care tasks.</p><p><strong>Conclusion: </strong>By matching the appropriate model with the task, the framework enables more effective deployment of embedding models, enhancing critical applications such as semantic search and retrieval-augmented generation (RAG).</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1877-1887"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning based prediction of medication adherence in heart failure using large electronic health record cohort with linkages to pharmacy-fill and neighborhood-level data. 基于机器学习的心力衰竭患者药物依从性预测,使用大型电子健康记录队列,并与药房填充和社区数据相关联。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf162
Samrachana Adhikari, Tyrel Stokes, Xiyue Li, Yunan Zhao, Cassidy Fitchett, Nathalia Ladino, Steven Lawrence, Min Qian, Young S Cho, Carine Hamo, John A Dodson, Rumi Chunara, Ian M Kronish, Amrita Mukhopadhyay, Saul B Blecker

Objective: While timely interventions can improve medication adherence, it is challenging to identify which patients are at risk of nonadherence at point-of-care. We aim to develop and validate flexible machine learning (ML) models to predict a continuous measure of adherence to guideline-directed medication therapies (GDMTs) for heart failure (HF).

Materials and methods: We utilized a large electronic health record (EHR) cohort of 34,697 HF patients seen at NYU Langone Health with an active prescription for ≥1 GDMT between April 01, 2021 and October 31, 2022. The outcome was adherence to GDMT measured as proportion of days covered (PDC) at 6 months following a clinical encounter. Over 120 predictors included patient-, therapy-, healthcare-, and neighborhood-level factors guided by the World Health Organization's model of barriers to adherence. We compared performance of several ML models and their ensemble (superlearner) for predicting PDC with traditional regression model (OLS) using mean absolute error (MAE) averaged across 10-fold cross-validation, % increase in MAE relative to superlearner, and predictive-difference across deciles of predicted PDC.

Results: Superlearner, a flexible nonparametric prediction approach, demonstrated superior prediction performance. Superlearner and quantile random forest had the lowest MAE (mean [95% CI] = 18.9% [18.7%-19.1%] for both), followed by MAEs for quantile neural network (19.5% [19.3%-19.7%]) and kernel support vector regression (19.8% [19.6%-20.0%]). Gradient boosted trees and OLS were the 2 worst performing models with 17% and 14% higher MAEs, respectively, relative to superlearner. Superlearner demonstrated improved predictive difference.

Conclusion: This development phase study suggests potential of linked EHR-pharmacy data and ML to identify HF patients who will benefit from medication adherence interventions.

Discussion: Fairness evaluation and external validation are needed prior to clinical integration.

目的:虽然及时干预可以提高药物依从性,但确定哪些患者在护理点有不依从性的风险是具有挑战性的。我们的目标是开发和验证灵活的机器学习(ML)模型,以预测心力衰竭(HF)的指南导向药物治疗(gdmt)的持续依从性。材料和方法:我们使用了一个大型电子健康记录(EHR)队列,包括34,697名心衰患者,这些患者在2021年4月1日至2022年10月31日期间在NYU Langone健康中心就诊,有效处方≥1gdmt。结果是在临床接触后6个月以覆盖天数(PDC)的比例衡量GDMT的依从性。超过120个预测因素包括患者、治疗、医疗保健和社区层面的因素,这些因素由世界卫生组织的依从性障碍模型指导。我们比较了几种ML模型及其集合(超级学习器)与传统回归模型(OLS)预测PDC的性能,使用10倍交叉验证的平均绝对误差(MAE),相对于超级学习器的MAE增加%,以及预测PDC的十分位数的预测差异。结果:Superlearner是一种灵活的非参数预测方法,具有较好的预测性能。超级学习器和分位数随机森林的MAE最低(两者的平均值[95% CI] = 18.9%[18.7%-19.1%]),其次是分位数神经网络的MAE(19.5%[19.3%-19.7%])和核支持向量回归(19.8%[19.6%-20.0%])。相对于超级学习器,梯度增强树和OLS是表现最差的两个模型,MAEs分别高出17%和14%。超级学习者表现出改进的预测差异。结论:这项发展阶段的研究表明,将ehr药房数据和ML相关联,可以识别将从药物依从性干预中受益的心衰患者。讨论:临床整合前需要进行公平性评估和外部验证。
{"title":"Machine learning based prediction of medication adherence in heart failure using large electronic health record cohort with linkages to pharmacy-fill and neighborhood-level data.","authors":"Samrachana Adhikari, Tyrel Stokes, Xiyue Li, Yunan Zhao, Cassidy Fitchett, Nathalia Ladino, Steven Lawrence, Min Qian, Young S Cho, Carine Hamo, John A Dodson, Rumi Chunara, Ian M Kronish, Amrita Mukhopadhyay, Saul B Blecker","doi":"10.1093/jamia/ocaf162","DOIUrl":"10.1093/jamia/ocaf162","url":null,"abstract":"<p><strong>Objective: </strong>While timely interventions can improve medication adherence, it is challenging to identify which patients are at risk of nonadherence at point-of-care. We aim to develop and validate flexible machine learning (ML) models to predict a continuous measure of adherence to guideline-directed medication therapies (GDMTs) for heart failure (HF).</p><p><strong>Materials and methods: </strong>We utilized a large electronic health record (EHR) cohort of 34,697 HF patients seen at NYU Langone Health with an active prescription for ≥1 GDMT between April 01, 2021 and October 31, 2022. The outcome was adherence to GDMT measured as proportion of days covered (PDC) at 6 months following a clinical encounter. Over 120 predictors included patient-, therapy-, healthcare-, and neighborhood-level factors guided by the World Health Organization's model of barriers to adherence. We compared performance of several ML models and their ensemble (superlearner) for predicting PDC with traditional regression model (OLS) using mean absolute error (MAE) averaged across 10-fold cross-validation, % increase in MAE relative to superlearner, and predictive-difference across deciles of predicted PDC.</p><p><strong>Results: </strong>Superlearner, a flexible nonparametric prediction approach, demonstrated superior prediction performance. Superlearner and quantile random forest had the lowest MAE (mean [95% CI] = 18.9% [18.7%-19.1%] for both), followed by MAEs for quantile neural network (19.5% [19.3%-19.7%]) and kernel support vector regression (19.8% [19.6%-20.0%]). Gradient boosted trees and OLS were the 2 worst performing models with 17% and 14% higher MAEs, respectively, relative to superlearner. Superlearner demonstrated improved predictive difference.</p><p><strong>Conclusion: </strong>This development phase study suggests potential of linked EHR-pharmacy data and ML to identify HF patients who will benefit from medication adherence interventions.</p><p><strong>Discussion: </strong>Fairness evaluation and external validation are needed prior to clinical integration.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1822-1832"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646373/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145201938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features. 预测阿片类药物使用障碍的药物治疗保留:使用NLP和llm衍生临床特征的机器学习方法。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf157
Fateme Nateghi Haredasht, Ivan Lopez, Steven Tate, Pooya Ashtari, Min Min Chan, Deepali Kulkarni, Chwen-Yuen Angie Chen, Maithri Vangala, Kira Griffith, Bryan Bunning, Adam S Miner, Tina Hernandez-Boussard, Keith Humphreys, Anna Lembke, L Alexander Vance, Jonathan H Chen

Objective: Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes.

Materials and methods: We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index.

Results: XGBoost achieved the highest classification performance (ROC-AUC = 0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (≈0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use.

Discussion: Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs.

Conclusion: Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.

目的:在我们之前预测阿片类药物使用障碍药物治疗保留的基础上,我们旨在通过结合应用于非结构化临床记录的大型语言模型(LLMs)的特征,提高丁丙诺啡-纳洛酮(BUP-NAL)治疗6个月的保留预测。材料和方法:我们使用来自Stanford health Care (STARR)的去识别电子健康记录(EHR)数据进行模型开发和内部验证,并使用NeuroBlu行为健康数据库进行外部验证。使用临床实体增强检索管道从自由文本笔记中提取13个临床和社会心理特征,补充结构化特征,该管道将命名实体识别与基于llm的分类相结合,以提供上下文解释。我们训练了分类(Logistic Regression, Random Forest, XGBoost)和生存模型(CoxPH, Random survival Forest, survival XGBoost),并使用受试者工作特征曲线下面积(ROC-AUC)和C-index进行评估。结果:XGBoost的分类性能最高(ROC-AUC = 0.65)。合并llm派生的特性可以改善所有架构中的模型性能,在简单的模型(如Logistic Regression)中可以观察到最大的收益。在时间-事件分析中,Random Survival Forest和Survival XGBoost的C-index最高(≈0.65)。SHapley加性解释分析确定了llm提取的特征,如慢性疼痛、肝脏疾病和重度抑郁症是关键的预测因素。我们还开发了一个交互式网络工具,用于实时临床使用。讨论:使用NLP和llm辅助方法提取的特征提高了模型的准确性和可解释性,揭示了结构化电子病历中未捕获的有价值的社会心理风险。结论:将结构化的EHR数据与llm提取的特征相结合,适度改善了BUP-NAL保留预测,实现了个性化的风险分层,并推进了人工智能驱动的物质使用障碍护理。
{"title":"Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features.","authors":"Fateme Nateghi Haredasht, Ivan Lopez, Steven Tate, Pooya Ashtari, Min Min Chan, Deepali Kulkarni, Chwen-Yuen Angie Chen, Maithri Vangala, Kira Griffith, Bryan Bunning, Adam S Miner, Tina Hernandez-Boussard, Keith Humphreys, Anna Lembke, L Alexander Vance, Jonathan H Chen","doi":"10.1093/jamia/ocaf157","DOIUrl":"10.1093/jamia/ocaf157","url":null,"abstract":"<p><strong>Objective: </strong>Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes.</p><p><strong>Materials and methods: </strong>We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index.</p><p><strong>Results: </strong>XGBoost achieved the highest classification performance (ROC-AUC = 0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (≈0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use.</p><p><strong>Discussion: </strong>Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs.</p><p><strong>Conclusion: </strong>Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1865-1876"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supporting public transit research in healthcare settings: testing a free, fast, and secure method for routing public transit from patient address to the point of care. 支持医疗保健环境中的公共交通研究:测试一种免费、快速和安全的方法,将公共交通从患者地址路由到护理点。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf161
Sinan L Aktay, Ozan A Aktay, Samia Menon, Shuo Jim Huang, Rozalina G McCoy

Objectives: Gaps in transportation, particularly public transit, are a significant barrier to accessible, high-quality healthcare. Health systems, payors, and regulatory bodies recognize the need to identify and address these gaps. However, clinical research examining public transportation accessibility and its impacts on healthcare utilization, outcomes, and costs remains limited. Existing tools used for studying public transit are generally non-HIPAA compliant, expensive, proprietary, and/or difficult to use. A tool addressing these concerns is needed to enable the incorporation of transportation variables into research and clinical care settings.

Materials and methods: We developed and implemented a novel framework for building a public transit routing system that is comprised of free, publicly available data and offline software to maintain HIPAA compliance. The system consists of a transit router and a geocoder for converting addresses into coordinates.

Results: A total of 463 879 out of 505 379 (∼91.8%) of Baltimore, Maryland, addresses were successfully routed to University of Maryland Medical Center in 24 hours of compute time. A significant portion of journeys consisted of walking (36% of median trip time) or using a transit vehicle (57.2%). Testing the router with varying random-access memory levels showed a plateau in routing speed between 12 and 20 GB. The geocoding approach is >90% consistent with a widely used but non-HIPAA compliant geocoder.

Discussion: The methodology and step-by-step guidance shared in this study can allow researchers, public health professionals, non-for-profit agencies, and other stakeholders to efficiently, effectively, and safely incorporate public transportation information into their work.

Conclusion: Public transportation routing using freely available data and software is possible in a HIPAA-compliant manner.

目标:交通,特别是公共交通方面的差距是获得高质量医疗保健的一个重大障碍。卫生系统、付款人和监管机构认识到有必要查明和解决这些差距。然而,关于公共交通可达性及其对医疗保健利用、结果和成本的影响的临床研究仍然有限。用于研究公共交通的现有工具通常不符合hipaa,昂贵,专有和/或难以使用。需要一种解决这些问题的工具,以便将交通变量纳入研究和临床护理环境。材料和方法:我们开发并实施了一个新的框架,用于构建公共交通路线系统,该系统由免费的、公开可用的数据和离线软件组成,以保持HIPAA的合规性。该系统由一个传输路由器和一个用于将地址转换为坐标的地理编码器组成。结果,马里兰州巴尔的摩市505 379个地址中,463 879个(约91.8%)在24小时内被成功路由到马里兰大学医学中心。很大一部分的行程包括步行(36%的中位行程时间)或使用交通工具(57.2%)。使用不同的随机访问内存级别测试路由器显示,路由速度在12到20 GB之间趋于平稳。该地理编码方法与广泛使用但不符合hipaa的地理编码器有90%的一致性。讨论:本研究中分享的方法和逐步指导可以使研究人员、公共卫生专业人员、非营利机构和其他利益相关者高效、有效和安全地将公共交通信息纳入他们的工作中。结论:在符合hipaa的方式下,使用免费数据和软件的公共交通路线是可能的。
{"title":"Supporting public transit research in healthcare settings: testing a free, fast, and secure method for routing public transit from patient address to the point of care.","authors":"Sinan L Aktay, Ozan A Aktay, Samia Menon, Shuo Jim Huang, Rozalina G McCoy","doi":"10.1093/jamia/ocaf161","DOIUrl":"10.1093/jamia/ocaf161","url":null,"abstract":"<p><strong>Objectives: </strong>Gaps in transportation, particularly public transit, are a significant barrier to accessible, high-quality healthcare. Health systems, payors, and regulatory bodies recognize the need to identify and address these gaps. However, clinical research examining public transportation accessibility and its impacts on healthcare utilization, outcomes, and costs remains limited. Existing tools used for studying public transit are generally non-HIPAA compliant, expensive, proprietary, and/or difficult to use. A tool addressing these concerns is needed to enable the incorporation of transportation variables into research and clinical care settings.</p><p><strong>Materials and methods: </strong>We developed and implemented a novel framework for building a public transit routing system that is comprised of free, publicly available data and offline software to maintain HIPAA compliance. The system consists of a transit router and a geocoder for converting addresses into coordinates.</p><p><strong>Results: </strong>A total of 463 879 out of 505 379 (∼91.8%) of Baltimore, Maryland, addresses were successfully routed to University of Maryland Medical Center in 24 hours of compute time. A significant portion of journeys consisted of walking (36% of median trip time) or using a transit vehicle (57.2%). Testing the router with varying random-access memory levels showed a plateau in routing speed between 12 and 20 GB. The geocoding approach is >90% consistent with a widely used but non-HIPAA compliant geocoder.</p><p><strong>Discussion: </strong>The methodology and step-by-step guidance shared in this study can allow researchers, public health professionals, non-for-profit agencies, and other stakeholders to efficiently, effectively, and safely incorporate public transportation information into their work.</p><p><strong>Conclusion: </strong>Public transportation routing using freely available data and software is possible in a HIPAA-compliant manner.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1802-1810"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646371/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145201900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Should we synthesize more than we need: impact of synthetic data generation for high-dimensional cross-sectional medical data. 我们是否应该合成比我们需要的更多:合成数据生成对高维横断面医疗数据的影响。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf169
Lisa Pilgram, Samer El Kababji, Dan Liu, Khaled El Emam

Objective: In medical research and education, generative artificial intelligence/machine learning (AI/ML) models to synthesize artificial medical data can enable the sharing of high-quality data while preserving the privacy of patients. Given that such data is often high-dimensional, a relevant consideration is whether to synthesize the entire dataset when only a task-relevant subset is needed. This study evaluates how the number of variables in training impacts fidelity, utility, and privacy of the synthetic data (SD).

Material and methods: We used 12 cross-sectional medical datasets, defined a downstream task with corresponding core variables, and derived 6354 variants by adding adjunct variables to the core. SD was generated using 7 different generative models and evaluated for fidelity, downstream utility, and privacy. Mixed-effect models were used to assess the effect of adjunct variables on the respective evaluation metric, accounting for the medical dataset as a random component.

Results: Fidelity was unaffected by the number of adjunct variables in 5/7 SDG models. Similarly, downstream utility remained stable in 6/7 (predictive task) and 5/7 (inferential task) SDG models. Where significant effects were observed, they were minimal, resulting, for example, in a 0.05 decrease in Area under the Receiver Operating Characteristic curve (AUROC) when adding 120 variables. Privacy was not impacted by the number of adjunct variables.

Discussion: Our findings show that fidelity, utility, and privacy are preserved when generating a more comprehensive medical dataset than the task-relevant subset.

Conclusion: Our findings support a cost-effective, utility, and privacy-preserving way of implementing SDG into medical research and education.

目的:在医学研究和教育中,利用生成式人工智能/机器学习(AI/ML)模型合成人工医疗数据,可以在保护患者隐私的同时实现高质量数据的共享。考虑到这些数据通常是高维的,一个相关的考虑是,当只需要一个任务相关的子集时,是否要合成整个数据集。本研究评估了训练中变量的数量如何影响合成数据(SD)的保真度、效用和隐私性。材料和方法:我们使用了12个横断面医学数据集,定义了具有相应核心变量的下游任务,并通过在核心中添加辅助变量衍生出6354个变体。使用7种不同的生成模型生成SD,并对保真度、下游效用和隐私性进行评估。混合效应模型用于评估辅助变量对各自评价指标的影响,将医疗数据集作为随机组成部分。结果:5/7个SDG模型中辅助变量的数量不影响保真度。同样,在6/7(预测任务)和5/7(推理任务)SDG模型中,下游效用保持稳定。当观察到显著的影响时,它们是最小的,例如,当增加120个变量时,受试者工作特征曲线下的面积(AUROC)减少0.05。隐私不受附加变量数量的影响。讨论:我们的研究结果表明,当生成比任务相关子集更全面的医疗数据集时,保真度、实用性和隐私性得到了保护。结论:我们的研究结果支持在医学研究和教育中实施可持续发展目标的一种具有成本效益、实用性和隐私保护的方式。
{"title":"Should we synthesize more than we need: impact of synthetic data generation for high-dimensional cross-sectional medical data.","authors":"Lisa Pilgram, Samer El Kababji, Dan Liu, Khaled El Emam","doi":"10.1093/jamia/ocaf169","DOIUrl":"10.1093/jamia/ocaf169","url":null,"abstract":"<p><strong>Objective: </strong>In medical research and education, generative artificial intelligence/machine learning (AI/ML) models to synthesize artificial medical data can enable the sharing of high-quality data while preserving the privacy of patients. Given that such data is often high-dimensional, a relevant consideration is whether to synthesize the entire dataset when only a task-relevant subset is needed. This study evaluates how the number of variables in training impacts fidelity, utility, and privacy of the synthetic data (SD).</p><p><strong>Material and methods: </strong>We used 12 cross-sectional medical datasets, defined a downstream task with corresponding core variables, and derived 6354 variants by adding adjunct variables to the core. SD was generated using 7 different generative models and evaluated for fidelity, downstream utility, and privacy. Mixed-effect models were used to assess the effect of adjunct variables on the respective evaluation metric, accounting for the medical dataset as a random component.</p><p><strong>Results: </strong>Fidelity was unaffected by the number of adjunct variables in 5/7 SDG models. Similarly, downstream utility remained stable in 6/7 (predictive task) and 5/7 (inferential task) SDG models. Where significant effects were observed, they were minimal, resulting, for example, in a 0.05 decrease in Area under the Receiver Operating Characteristic curve (AUROC) when adding 120 variables. Privacy was not impacted by the number of adjunct variables.</p><p><strong>Discussion: </strong>Our findings show that fidelity, utility, and privacy are preserved when generating a more comprehensive medical dataset than the task-relevant subset.</p><p><strong>Conclusion: </strong>Our findings support a cost-effective, utility, and privacy-preserving way of implementing SDG into medical research and education.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1843-1854"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Including AI in diffusion-weighted breast MRI has potential to increase reader confidence and reduce workload. 在弥散加权乳房MRI中加入人工智能有可能增加读者的信心并减少工作量。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf156
Dimitrios Bounias, Lina Simons, Michael Baumgartner, Chris Ehring, Peter Neher, Lorenz A Kapsner, Balint Kovacs, Ralf Floca, Paul F Jaeger, Jessica Eberle, Dominique Hadler, Frederik B Laun, Sabine Ohlmeyer, Lena Maier-Hein, Michael Uder, Evelyn Wenkel, Klaus H Maier-Hein, Sebastian Bickelhaupt

Objectives: Breast diffusion-weighted imaging (DWI) has shown potential as a standalone imaging technique for certain indications, eg, supplemental screening of women with dense breasts. This study evaluates an artificial intelligence (AI)-powered computer-aided diagnosis (CAD) system for clinical interpretation and workload reduction in breast DWI.

Materials and methods: This retrospective IRB-approved study included: n = 824 examinations for model development (2017-2020) and n = 235 for evaluation (01/2021-06/2021). Readings were performed by three readers using either the AI-CAD or manual readings. BI-RADS-like (Breast Imaging Reporting and Data System) classification was based on DWI. Histopathology served as ground truth. The model was nnDetection-based, trained using 5-fold cross-validation and ensembling. Statistical significance was determined using McNemar's test. Inter-rater agreement was calculated using Cohen's kappa. Model performance was calculated using the area under the receiver operating curve (AUC).

Results: The AI-augmented approach significantly reduced BI-RADS-like 3 calls in breast DWI by 29% (P =.019) and increased interrater agreement (0.57 ± 0.10 vs 0.49 ± 0.11), while preserving diagnostic accuracy. Two of the three readers detected more malignant lesions (63/69 vs 59/69 and 64/69 vs 62/69) with the AI-CAD. The AI model achieved an AUC of 0.78 (95% CI: [0.72, 0.85]; P <.001), which increased for women at screening age to 0.82 (95% CI: [0.73, 0.90]; P <.001), indicating a potential for workload reduction of 20.9% at 96% sensitivity.

Discussion and conclusion: Breast DWI might benefit from AI support. In our study, AI showed potential for reduction of BI-RADS-like 3 calls and increase of inter-rater agreement. However, given the limited study size, further research is needed.

目的:乳腺弥散加权成像(DWI)已经显示出作为一种独立成像技术在某些适应症中的潜力,例如,对致密乳房的女性进行补充筛查。本研究评估了一种人工智能(AI)驱动的计算机辅助诊断(CAD)系统,用于临床解释和减少乳腺DWI的工作量。材料和方法:这项经irb批准的回顾性研究包括:n = 824例模型开发检查(2017-2020)和n = 235例评估检查(2021年1月- 2021年6月)。读数由三名读者使用AI-CAD或手动读数进行。bi - rads类(乳腺成像报告和数据系统)分类基于DWI。组织病理学是最基本的事实。该模型基于nndetection,使用5倍交叉验证和集成进行训练。采用McNemar检验确定统计学显著性。评级机构间的协议是用科恩的kappa来计算的。模型性能计算使用面积下的接收者工作曲线(AUC)。结果:人工智能增强方法在保持诊断准确性的同时,显著减少了乳房DWI中bi - rads样3次呼叫29% (P = 0.019),提高了判据一致性(0.57±0.10 vs 0.49±0.11)。三名读卡器中有两名使用AI-CAD检测到更多的恶性病变(63/69 vs 59/69, 64/69 vs 62/69)。人工智能模型的AUC为0.78 (95% CI: [0.72, 0.85]; P讨论和结论:乳房DWI可能受益于人工智能的支持。在我们的研究中,人工智能显示出减少bi - rad -like 3呼叫和增加评级间协议的潜力。然而,由于研究规模有限,还需要进一步的研究。
{"title":"Including AI in diffusion-weighted breast MRI has potential to increase reader confidence and reduce workload.","authors":"Dimitrios Bounias, Lina Simons, Michael Baumgartner, Chris Ehring, Peter Neher, Lorenz A Kapsner, Balint Kovacs, Ralf Floca, Paul F Jaeger, Jessica Eberle, Dominique Hadler, Frederik B Laun, Sabine Ohlmeyer, Lena Maier-Hein, Michael Uder, Evelyn Wenkel, Klaus H Maier-Hein, Sebastian Bickelhaupt","doi":"10.1093/jamia/ocaf156","DOIUrl":"10.1093/jamia/ocaf156","url":null,"abstract":"<p><strong>Objectives: </strong>Breast diffusion-weighted imaging (DWI) has shown potential as a standalone imaging technique for certain indications, eg, supplemental screening of women with dense breasts. This study evaluates an artificial intelligence (AI)-powered computer-aided diagnosis (CAD) system for clinical interpretation and workload reduction in breast DWI.</p><p><strong>Materials and methods: </strong>This retrospective IRB-approved study included: n = 824 examinations for model development (2017-2020) and n = 235 for evaluation (01/2021-06/2021). Readings were performed by three readers using either the AI-CAD or manual readings. BI-RADS-like (Breast Imaging Reporting and Data System) classification was based on DWI. Histopathology served as ground truth. The model was nnDetection-based, trained using 5-fold cross-validation and ensembling. Statistical significance was determined using McNemar's test. Inter-rater agreement was calculated using Cohen's kappa. Model performance was calculated using the area under the receiver operating curve (AUC).</p><p><strong>Results: </strong>The AI-augmented approach significantly reduced BI-RADS-like 3 calls in breast DWI by 29% (P =.019) and increased interrater agreement (0.57 ± 0.10 vs 0.49 ± 0.11), while preserving diagnostic accuracy. Two of the three readers detected more malignant lesions (63/69 vs 59/69 and 64/69 vs 62/69) with the AI-CAD. The AI model achieved an AUC of 0.78 (95% CI: [0.72, 0.85]; P <.001), which increased for women at screening age to 0.82 (95% CI: [0.73, 0.90]; P <.001), indicating a potential for workload reduction of 20.9% at 96% sensitivity.</p><p><strong>Discussion and conclusion: </strong>Breast DWI might benefit from AI support. In our study, AI showed potential for reduction of BI-RADS-like 3 calls and increase of inter-rater agreement. However, given the limited study size, further research is needed.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1908-1915"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FHIR-Former: enhancing clinical predictions through Fast Healthcare Interoperability Resources and large language models. FHIR-Former:通过快速医疗互操作性资源和大型语言模型增强临床预测。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf165
Merlin Engelke, Giulia Baldini, Jens Kleesiek, Felix Nensa, Amin Dada

Objective: To address the challenges of data heterogeneity and manual feature engineering in clinical predictive modeling, we introduce FHIR-Former, an open-source framework integrating Fast Healthcare Interoperability Resources (FHIR) with large language models (LLMs) to automate and standardize clinical prediction tasks.

Materials and methods: FHIR-Former dynamically processes structured (eg, lab results, medications) and unstructured (eg, clinical notes) data from FHIR resources. The pipeline supports multiple classification tasks, including 30-day readmission, imaging study prediction, and ICD code classification. Leveraging open-source LLMs (GeBERTa), we trained models on 1.1 million data points across ten FHIR resources using retrospective inpatient data (2018-2024). Hyperparameters were optimized via Bayesian methods, and outputs were mapped to FHIR RiskAssessment resources for interoperability.

Results: FHIR-Former achieved an F1-score of 70.7% and accuracy of 72.9% for 30-day readmission, 51.8% F1-score (88.1% accuracy) for mortality prediction, and 61% macro F1-score for imaging study classification. The ICD code prediction model attained 94% accuracy. Performance demonstrated promising performance for readmission and showed scalability across tasks without manual feature engineering.

Discussion: FHIR-Former eliminates institution-specific preprocessing by adapting to diverse FHIR implementations, enabling seamless integration of multimodal data. Its configurable architecture outperformed prior frameworks reliant on static inputs or limited to unstructured text. Real-time risk scores embedded in FHIR servers enhance clinical workflows without disrupting existing practices.

Conclusion: By harmonizing FHIR standardization with LLM flexibility, FHIR-Former advances scalable, interoperable predictive modeling in healthcare. The open-source framework facilitates automation, improves resource allocation, and supports personalized decision-making, bridging gaps between AI innovation and clinical practice.

目的:为了解决临床预测建模中数据异构和手动特征工程的挑战,我们引入了FHIR- former,这是一个将快速医疗互操作性资源(FHIR)与大型语言模型(llm)集成在一起的开源框架,用于自动化和标准化临床预测任务。材料和方法:FHIR- former动态处理来自FHIR资源的结构化(如实验室结果、药物)和非结构化(如临床记录)数据。该管道支持多种分类任务,包括30天再入院、成像研究预测和ICD代码分类。利用开源法学硕士(GeBERTa),我们使用回顾性住院患者数据(2018-2024)在10个FHIR资源中的110万个数据点上训练模型。通过贝叶斯方法优化超参数,并将输出映射到FHIR风险评估资源以实现互操作性。结果:FHIR-Former对30天再入院患者的f1评分为70.7%,准确率为72.9%,对死亡率预测的f1评分为51.8%,准确率为88.1%,对影像学研究分类的宏观f1评分为61%。ICD代码预测模型的准确率达到94%。性能展示了重入的良好性能,并展示了跨任务的可伸缩性,无需手动特征工程。讨论:FHIR- former通过适应不同的FHIR实现消除了机构特定的预处理,实现了多模式数据的无缝集成。其可配置架构优于依赖静态输入或仅限于非结构化文本的先前框架。嵌入在FHIR服务器中的实时风险评分可以在不破坏现有实践的情况下增强临床工作流程。结论:通过协调FHIR标准化和LLM灵活性,FHIR- former在医疗保健领域推进了可扩展、可互操作的预测建模。开源框架促进了自动化,改善了资源分配,并支持个性化决策,弥合了人工智能创新与临床实践之间的差距。
{"title":"FHIR-Former: enhancing clinical predictions through Fast Healthcare Interoperability Resources and large language models.","authors":"Merlin Engelke, Giulia Baldini, Jens Kleesiek, Felix Nensa, Amin Dada","doi":"10.1093/jamia/ocaf165","DOIUrl":"10.1093/jamia/ocaf165","url":null,"abstract":"<p><strong>Objective: </strong>To address the challenges of data heterogeneity and manual feature engineering in clinical predictive modeling, we introduce FHIR-Former, an open-source framework integrating Fast Healthcare Interoperability Resources (FHIR) with large language models (LLMs) to automate and standardize clinical prediction tasks.</p><p><strong>Materials and methods: </strong>FHIR-Former dynamically processes structured (eg, lab results, medications) and unstructured (eg, clinical notes) data from FHIR resources. The pipeline supports multiple classification tasks, including 30-day readmission, imaging study prediction, and ICD code classification. Leveraging open-source LLMs (GeBERTa), we trained models on 1.1 million data points across ten FHIR resources using retrospective inpatient data (2018-2024). Hyperparameters were optimized via Bayesian methods, and outputs were mapped to FHIR RiskAssessment resources for interoperability.</p><p><strong>Results: </strong>FHIR-Former achieved an F1-score of 70.7% and accuracy of 72.9% for 30-day readmission, 51.8% F1-score (88.1% accuracy) for mortality prediction, and 61% macro F1-score for imaging study classification. The ICD code prediction model attained 94% accuracy. Performance demonstrated promising performance for readmission and showed scalability across tasks without manual feature engineering.</p><p><strong>Discussion: </strong>FHIR-Former eliminates institution-specific preprocessing by adapting to diverse FHIR implementations, enabling seamless integration of multimodal data. Its configurable architecture outperformed prior frameworks reliant on static inputs or limited to unstructured text. Real-time risk scores embedded in FHIR servers enhance clinical workflows without disrupting existing practices.</p><p><strong>Conclusion: </strong>By harmonizing FHIR standardization with LLM flexibility, FHIR-Former advances scalable, interoperable predictive modeling in healthcare. The open-source framework facilitates automation, improves resource allocation, and supports personalized decision-making, bridging gaps between AI innovation and clinical practice.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1793-1801"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models accurately identify immunosuppression in intensive care unit patients. 大型语言模型准确识别重症监护病房患者的免疫抑制。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf141
Vijeeth Guggilla, Mengjia Kang, Melissa J Bak, Steven D Tran, Anna Pawlowski, Prasanth Nannapaneni, Luke V Rasmussen, Daniel Schneider, Helen K Donnelly, Ankit Agrawal, David Liebovitz, Alexander V Misharin, G R Scott Budinger, Richard G Wunderink, Theresa L Walunas, Catherine A Gao

Objective: Rule-based structured data algorithms and natural language processing (NLP) approaches applied to unstructured clinical notes have limited accuracy and poor generalizability for identifying immunosuppression. Large language models (LLMs) may effectively identify patients with heterogenous types of immunosuppression from unstructured clinical notes. We compared the performance of LLMs applied to unstructured notes for identifying patients with immunosuppressive conditions or immunosuppressive medication use against 2 baselines: (1) structured data algorithms using diagnosis codes and medication orders and (2) NLP approaches applied to unstructured notes.

Materials and methods: We used hospital admission notes from a primary cohort of 827 intensive care unit (ICU) patients at Northwestern Memorial Hospital and a validation cohort of 200 ICU patients at Beth Israel Deaconess Medical Center, along with diagnosis codes and medication orders from the primary cohort. We evaluated the performance of structured data algorithms, NLP approaches, and LLMs in identifying 7 immunosuppressive conditions and 6 immunosuppressive medications.

Results: In the primary cohort, structured data algorithms achieved peak F1 scores ranging from 0.30 to 0.97 for identifying immunosuppressive conditions and medications. NLP approaches achieved peak F1 scores ranging from 0 to 1. GPT-4o outperformed or matched structured data algorithms and NLP approaches across all conditions and medications, with F1 scores ranging from 0.51 to 1. GPT-4o also performed impressively in our validation cohort (F1 = 1 for 8/13 variables).

Discussion: LLMs, particularly GPT-4o, outperformed structured data algorithms and NLP approaches in identifying immunosuppressive conditions and medications with robust external validation.

Conclusion: LLMs can be applied for improved cohort identification for research purposes.

目的:基于规则的结构化数据算法和应用于非结构化临床记录的自然语言处理(NLP)方法在识别免疫抑制方面准确性有限,通用性差。大型语言模型(LLMs)可以从非结构化的临床记录中有效地识别异质型免疫抑制患者。我们比较了应用于非结构化笔记的llm的性能,用于识别免疫抑制状况或免疫抑制药物使用的患者,对比了两个基线:(1)使用诊断代码和药物订单的结构化数据算法,以及(2)应用于非结构化笔记的NLP方法。材料和方法:我们使用了来自西北纪念医院827名重症监护病房(ICU)患者的主要队列和来自贝斯以色列女执事医疗中心200名ICU患者的验证队列的住院记录,以及来自主要队列的诊断代码和用药单。我们评估了结构化数据算法、NLP方法和llm在识别7种免疫抑制条件和6种免疫抑制药物方面的性能。结果:在主要队列中,结构化数据算法在识别免疫抑制疾病和药物方面达到了0.30至0.97的F1评分峰值。NLP方法的F1得分峰值在0到1之间。gpt - 40在所有疾病和药物治疗中表现优于或匹配结构化数据算法和NLP方法,F1得分范围为0.51至1。gpt - 40在我们的验证队列中也表现令人印象深刻(8/13个变量F1 = 1)。讨论:llm,特别是gpt - 40,在识别免疫抑制条件和药物方面优于结构化数据算法和NLP方法,并具有强大的外部验证。结论:llm可用于改进队列识别,用于研究目的。
{"title":"Large language models accurately identify immunosuppression in intensive care unit patients.","authors":"Vijeeth Guggilla, Mengjia Kang, Melissa J Bak, Steven D Tran, Anna Pawlowski, Prasanth Nannapaneni, Luke V Rasmussen, Daniel Schneider, Helen K Donnelly, Ankit Agrawal, David Liebovitz, Alexander V Misharin, G R Scott Budinger, Richard G Wunderink, Theresa L Walunas, Catherine A Gao","doi":"10.1093/jamia/ocaf141","DOIUrl":"10.1093/jamia/ocaf141","url":null,"abstract":"<p><strong>Objective: </strong>Rule-based structured data algorithms and natural language processing (NLP) approaches applied to unstructured clinical notes have limited accuracy and poor generalizability for identifying immunosuppression. Large language models (LLMs) may effectively identify patients with heterogenous types of immunosuppression from unstructured clinical notes. We compared the performance of LLMs applied to unstructured notes for identifying patients with immunosuppressive conditions or immunosuppressive medication use against 2 baselines: (1) structured data algorithms using diagnosis codes and medication orders and (2) NLP approaches applied to unstructured notes.</p><p><strong>Materials and methods: </strong>We used hospital admission notes from a primary cohort of 827 intensive care unit (ICU) patients at Northwestern Memorial Hospital and a validation cohort of 200 ICU patients at Beth Israel Deaconess Medical Center, along with diagnosis codes and medication orders from the primary cohort. We evaluated the performance of structured data algorithms, NLP approaches, and LLMs in identifying 7 immunosuppressive conditions and 6 immunosuppressive medications.</p><p><strong>Results: </strong>In the primary cohort, structured data algorithms achieved peak F1 scores ranging from 0.30 to 0.97 for identifying immunosuppressive conditions and medications. NLP approaches achieved peak F1 scores ranging from 0 to 1. GPT-4o outperformed or matched structured data algorithms and NLP approaches across all conditions and medications, with F1 scores ranging from 0.51 to 1. GPT-4o also performed impressively in our validation cohort (F1 = 1 for 8/13 variables).</p><p><strong>Discussion: </strong>LLMs, particularly GPT-4o, outperformed structured data algorithms and NLP approaches in identifying immunosuppressive conditions and medications with robust external validation.</p><p><strong>Conclusion: </strong>LLMs can be applied for improved cohort identification for research purposes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1888-1898"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient extraction of medication information from clinical notes: an evaluation in 2 languages. 从临床记录中有效提取药物信息:两种语言的评估。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf113
Thibaut Fabacher, Erik-André Sauleau, Emmanuelle Arcay, Bineta Faye, Maxime Alter, Archia Chahard, Nathan Miraillet, Adrien Coulet, Aurélie Névéol

Objective: To evaluate the accuracy, computational cost, and portability of a new natural language processing (NLP) method for extracting medication information from clinical narratives.

Materials and methods: We propose an original transformer-based architecture for the extraction of entities and their relations pertaining to patients' medication regimen. First, we used this approach to train and evaluate a model on French clinical notes, using a newly annotated corpus from Hôpitaux Universitaires de Strasbourg. Second, the portability of the approach was assessed by conducting an evaluation on clinical documents in English from the 2018 n2c2 shared task. Information extraction accuracy and computational cost were assessed by comparison with an available method using transformers.

Results: The proposed architecture achieves on the task of relation extraction itself performance that are competitive with the state-of-the-art on both French and English (F-measures 0.82 and 0.96 vs 0.81 and 0.95), but reduces the computational cost by 10. End-to-end (Named Entity recognition and Relation Extraction) F1 performance is 0.69 and 0.82 for French and English corpus.

Discussion: While an existing system developed for English notes was deployed in a French hospital setting with reasonable effort, we found that an alternative architecture offered end-to-end drug information extraction with comparable extraction performance and lower computational impact for both French and English clinical text processing, respectively.

Conclusion: The proposed architecture can be used to extract medication information from clinical text with high performance and low computational cost and consequently suits with usually limited hospital IT resources.

目的:评估一种新的自然语言处理(NLP)方法从临床叙述中提取药物信息的准确性、计算成本和可移植性。材料和方法:我们提出了一种基于变压器的原始架构,用于提取与患者用药方案相关的实体及其关系。首先,我们使用来自Hôpitaux Universitaires de Strasbourg的新注释语料库,使用这种方法来训练和评估法国临床笔记的模型。其次,通过对2018年n2c2共享任务中的英文临床文件进行评估,评估了该方法的可移植性。通过与现有的变压器信息提取方法的比较,评估了信息提取的准确性和计算成本。结果:所提出的架构在关系提取任务本身的性能上达到了与法语和英语的最先进技术相竞争的水平(f值为0.82和0.96 vs 0.81和0.95),但计算成本降低了10。法语和英语语料库的端到端(命名实体识别和关系提取)F1性能分别为0.69和0.82。讨论:虽然为英语笔记开发的现有系统在法国医院环境中部署了合理的努力,但我们发现另一种架构提供端到端的药物信息提取,其提取性能与法语和英语临床文本处理相当,并且计算影响更低。结论:所提出的体系结构能够高效、低计算成本地从临床文本中提取药物信息,适合医院通常有限的IT资源。
{"title":"Efficient extraction of medication information from clinical notes: an evaluation in 2 languages.","authors":"Thibaut Fabacher, Erik-André Sauleau, Emmanuelle Arcay, Bineta Faye, Maxime Alter, Archia Chahard, Nathan Miraillet, Adrien Coulet, Aurélie Névéol","doi":"10.1093/jamia/ocaf113","DOIUrl":"10.1093/jamia/ocaf113","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy, computational cost, and portability of a new natural language processing (NLP) method for extracting medication information from clinical narratives.</p><p><strong>Materials and methods: </strong>We propose an original transformer-based architecture for the extraction of entities and their relations pertaining to patients' medication regimen. First, we used this approach to train and evaluate a model on French clinical notes, using a newly annotated corpus from Hôpitaux Universitaires de Strasbourg. Second, the portability of the approach was assessed by conducting an evaluation on clinical documents in English from the 2018 n2c2 shared task. Information extraction accuracy and computational cost were assessed by comparison with an available method using transformers.</p><p><strong>Results: </strong>The proposed architecture achieves on the task of relation extraction itself performance that are competitive with the state-of-the-art on both French and English (F-measures 0.82 and 0.96 vs 0.81 and 0.95), but reduces the computational cost by 10. End-to-end (Named Entity recognition and Relation Extraction) F1 performance is 0.69 and 0.82 for French and English corpus.</p><p><strong>Discussion: </strong>While an existing system developed for English notes was deployed in a French hospital setting with reasonable effort, we found that an alternative architecture offered end-to-end drug information extraction with comparable extraction performance and lower computational impact for both French and English clinical text processing, respectively.</p><p><strong>Conclusion: </strong>The proposed architecture can be used to extract medication information from clinical text with high performance and low computational cost and consequently suits with usually limited hospital IT resources.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1855-1864"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145276320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1