首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Determining optimal strategies for personalized atrial fibrillation treatment in intensive care unit patients using a deep learning-based causal inference approach: rhythm and/or rate control. 使用基于深度学习的因果推理方法确定重症监护病房患者房颤个性化治疗的最佳策略:节奏和/或速率控制。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf203
Min Woo Kang, Shin Young Ahn, Yoonjin Kang

Objectives: Atrial fibrillation (AF) is common among intensive care unit (ICU) patients. Effective management of AF in this setting remains a subject of debate, with current guidelines often derived from outpatient studies. This study aims to evaluate the effectiveness of different AF management strategies-both, rhythm, rate, or no control-in reducing mortality in ICU patients using a deep learning-based causal inference model.

Materials and methods: Data from the Medical Information Mart for Intensive Care (MIMIC)-III and MIMIC-IV were utilized, encompassing ICU admissions with documented AF. Exposures included both rhythm and rate, only rhythm, and only rate, or no control. A deep learning-based causal inference model analyzed treatment effects. Additionally, the characteristics of patients who benefited more from rhythm control compared to rate control were identified using treatment effect sizes and multivariable logistic regression.

Results: The study population comprised 13 583 patients. Both rhythm and rate control, rhythm control-only, and rate control-only strategies significantly reduced in-hospital mortality compared to no control, with average treatment effects of -1.23% (-1.43% to -1.03%), -2.32% (-2.48% to -2.15%), and -9.11% (-9.29% to -8.93%), respectively. Rhythm control proved more effective than rate control in specific subgroups: older age, higher maximum heart rate, presence of new-onset AF, absence of hypertension, absence of diabetes, chronic liver disease, not having undergone heart surgery, and the use of vasopressor agents.

Conclusion: Using a deep learning-based causal inference model, we quantified mortality reduction for each treatment strategy and identified the patient characteristics associated with the most favorable outcomes for each strategy.

目的:房颤(AF)在重症监护病房(ICU)患者中很常见。在这种情况下,房颤的有效管理仍然是一个有争议的话题,目前的指导方针通常来自门诊研究。本研究旨在利用基于深度学习的因果推理模型,评估不同房颤管理策略(节律、频率或不控制)在降低ICU患者死亡率方面的有效性。材料和方法:使用重症监护医学信息市场(MIMIC)-III和MIMIC- iv的数据,包括有AF记录的ICU入院患者。暴露包括节律和速率,仅包括节律和速率,或无对照。基于深度学习的因果推理模型分析了治疗效果。此外,通过治疗效应大小和多变量逻辑回归,确定了从节律控制中获益更多的患者的特征。结果:研究人群包括13 583例患者。与无对照相比,节律和速率控制、仅节律控制和仅速率控制策略均显著降低了住院死亡率,平均治疗效果分别为-1.23%(-1.43%至-1.03%)、-2.32%(-2.48%至-2.15%)和-9.11%(-9.29%至-8.93%)。在特定的亚组中,心律控制比心率控制更有效:年龄较大、最大心率较高、新发房颤、无高血压、无糖尿病、慢性肝病、未接受心脏手术和使用血管加压药物。结论:使用基于深度学习的因果推理模型,我们量化了每种治疗策略的死亡率降低,并确定了与每种策略最有利结果相关的患者特征。
{"title":"Determining optimal strategies for personalized atrial fibrillation treatment in intensive care unit patients using a deep learning-based causal inference approach: rhythm and/or rate control.","authors":"Min Woo Kang, Shin Young Ahn, Yoonjin Kang","doi":"10.1093/jamia/ocaf203","DOIUrl":"10.1093/jamia/ocaf203","url":null,"abstract":"<p><strong>Objectives: </strong>Atrial fibrillation (AF) is common among intensive care unit (ICU) patients. Effective management of AF in this setting remains a subject of debate, with current guidelines often derived from outpatient studies. This study aims to evaluate the effectiveness of different AF management strategies-both, rhythm, rate, or no control-in reducing mortality in ICU patients using a deep learning-based causal inference model.</p><p><strong>Materials and methods: </strong>Data from the Medical Information Mart for Intensive Care (MIMIC)-III and MIMIC-IV were utilized, encompassing ICU admissions with documented AF. Exposures included both rhythm and rate, only rhythm, and only rate, or no control. A deep learning-based causal inference model analyzed treatment effects. Additionally, the characteristics of patients who benefited more from rhythm control compared to rate control were identified using treatment effect sizes and multivariable logistic regression.</p><p><strong>Results: </strong>The study population comprised 13 583 patients. Both rhythm and rate control, rhythm control-only, and rate control-only strategies significantly reduced in-hospital mortality compared to no control, with average treatment effects of -1.23% (-1.43% to -1.03%), -2.32% (-2.48% to -2.15%), and -9.11% (-9.29% to -8.93%), respectively. Rhythm control proved more effective than rate control in specific subgroups: older age, higher maximum heart rate, presence of new-onset AF, absence of hypertension, absence of diabetes, chronic liver disease, not having undergone heart surgery, and the use of vasopressor agents.</p><p><strong>Conclusion: </strong>Using a deep learning-based causal inference model, we quantified mortality reduction for each treatment strategy and identified the patient characteristics associated with the most favorable outcomes for each strategy.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"679-689"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge graph-augmented large language models for reconstructing life course risk pathways: a gestational diabetes mellitus-to-dementia case study. 重建生命历程风险路径的知识图谱增强大语言模型:一个妊娠期糖尿病到痴呆的案例研究。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf219
Shuang Wang, Yang Zhang, Ying Gao, Xin He, Guanghui Deng, Jian Du

Objectives: To develop and evaluate a knowledge graph-augmented large language model (LLM) framework that synthesizes epidemiological evidence to infer life-course exposure-outcome pathways, using gestational diabetes mellitus (GDM) and dementia as a case study.

Materials and methods: We constructed a causal knowledge graph by extracting empirical epidemiological associations from scientific literature, excluding hypothetical assertions. The graph was integrated with GPT-4 through four graph retrieval-augmented generation (GRAG) strategies to infer bridging variables between early-life exposure (GDM) and later-life outcome (dementia). Semantic triples served as structured inputs to support LLM reasoning. Each GRAG strategy was evaluated by human clinical experts and three LLM-based reviewers (GPT-4o, Llama 3-70B, and Gemini Advanced), assessing scientific reliability, novelty, and clinical relevance.

Results: The GRAG strategy using a minimal set of abstracts specifically related to GDM-dementia bridging variables performed comparably to the strategy using broader sub-community abstracts, and both significantly outperformed approaches using the full GDM- or dementia-related corpus or baseline GPT-4 without external augmentation. The knowledge graph-augmented LLM identified 108 maternal candidate mediators, including validated risk factors such as chronic kidney disease and physical inactivity. The structured approach improved accuracy and reduced confabulation compared to standard LLM outputs.

Discussion: Our findings suggest that augmenting LLMs with epidemiological knowledge graphs enables effective reasoning over fragmented literature and supports the reconstruction of progressive risk pathways. Expert assessments revealed that LLMs may overestimate clinical relevance, highlighting the need for human-AI collaboration in interpretation and application.

Conclusion: Integrating semantic epidemiological knowledge with LLMs via GRAG strategies provides a promising framework for life-course epidemiology, enabling early detection of modifiable risk factors and guiding variable selection in cohort study design.

目的:以妊娠期糖尿病(GDM)和痴呆为例,开发和评估一个知识图增强大语言模型(LLM)框架,该框架综合流行病学证据来推断生命过程暴露-结局途径。材料和方法:我们通过从科学文献中提取经验流行病学关联来构建因果知识图,排除假设断言。该图表通过四种图表检索增强生成(GRAG)策略与GPT-4整合,以推断早期生活暴露(GDM)和后期生活结果(痴呆)之间的桥接变量。语义三元组作为结构化输入来支持LLM推理。每个GRAG策略由人类临床专家和三位基于法学硕士的评审员(gpt - 40、Llama 3-70B和Gemini Advanced)评估,评估科学可靠性、新颖性和临床相关性。结果:使用与GDM-痴呆桥接变量相关的最小摘要集的GRAG策略与使用更广泛的子社区摘要的策略表现相当,并且两者都明显优于使用完整GDM或痴呆相关语料库或基线GPT-4的方法,没有外部增强。知识图谱增强的LLM确定了108个母体候选介质,包括经过验证的风险因素,如慢性肾脏疾病和缺乏身体活动。与标准LLM输出相比,结构化方法提高了精度并减少了虚构。讨论:我们的研究结果表明,用流行病学知识图谱增加llm可以对碎片化的文献进行有效的推理,并支持渐进风险路径的重建。专家评估显示,法学硕士可能高估了临床相关性,强调了在解释和应用方面需要人类与人工智能合作。结论:通过GRAG策略将语义流行病学知识与llm整合为生命过程流行病学提供了一个有希望的框架,可以早期发现可改变的危险因素,并指导队列研究设计中的变量选择。
{"title":"Knowledge graph-augmented large language models for reconstructing life course risk pathways: a gestational diabetes mellitus-to-dementia case study.","authors":"Shuang Wang, Yang Zhang, Ying Gao, Xin He, Guanghui Deng, Jian Du","doi":"10.1093/jamia/ocaf219","DOIUrl":"10.1093/jamia/ocaf219","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and evaluate a knowledge graph-augmented large language model (LLM) framework that synthesizes epidemiological evidence to infer life-course exposure-outcome pathways, using gestational diabetes mellitus (GDM) and dementia as a case study.</p><p><strong>Materials and methods: </strong>We constructed a causal knowledge graph by extracting empirical epidemiological associations from scientific literature, excluding hypothetical assertions. The graph was integrated with GPT-4 through four graph retrieval-augmented generation (GRAG) strategies to infer bridging variables between early-life exposure (GDM) and later-life outcome (dementia). Semantic triples served as structured inputs to support LLM reasoning. Each GRAG strategy was evaluated by human clinical experts and three LLM-based reviewers (GPT-4o, Llama 3-70B, and Gemini Advanced), assessing scientific reliability, novelty, and clinical relevance.</p><p><strong>Results: </strong>The GRAG strategy using a minimal set of abstracts specifically related to GDM-dementia bridging variables performed comparably to the strategy using broader sub-community abstracts, and both significantly outperformed approaches using the full GDM- or dementia-related corpus or baseline GPT-4 without external augmentation. The knowledge graph-augmented LLM identified 108 maternal candidate mediators, including validated risk factors such as chronic kidney disease and physical inactivity. The structured approach improved accuracy and reduced confabulation compared to standard LLM outputs.</p><p><strong>Discussion: </strong>Our findings suggest that augmenting LLMs with epidemiological knowledge graphs enables effective reasoning over fragmented literature and supports the reconstruction of progressive risk pathways. Expert assessments revealed that LLMs may overestimate clinical relevance, highlighting the need for human-AI collaboration in interpretation and application.</p><p><strong>Conclusion: </strong>Integrating semantic epidemiological knowledge with LLMs via GRAG strategies provides a promising framework for life-course epidemiology, enabling early detection of modifiable risk factors and guiding variable selection in cohort study design.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"632-640"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing and evaluation of generative large language models in electronic health record applications: a systematic review. 电子健康记录应用中生成大语言模型的测试和评估:系统回顾。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf233
Xinsong Du, Zhengyang Zhou, Yifei Wang, Ya-Wen Chuang, Yiming Li, Richard Yang, Wenyu Zhang, Xinyi Wang, Xinyu Chen, Hao Guan, John Lian, Pengyu Hong, David W Bates, Li Zhou

Background: The use of generative large language models (LLMs) with electronic health record (EHR) data is rapidly expanding to support clinical and research tasks. This systematic review characterizes the clinical fields and use cases that have been studied and evaluated to date.

Methods: We followed the Preferred Reporting Items for Systematic Review and Meta-Analyses guidelines to conduct a systematic review of articles from PubMed and Web of Science published between January 1, 2023, and November 9, 2024. Studies were included if they used generative LLMs to analyze real-world EHR data and reported quantitative performance evaluations. Through data extraction, we identified clinical specialties and tasks for each included article, and summarized evaluation methods.

Results: Of the 18 735 articles retrieved, 196 met our criteria. Most studies focused on radiology (26.0%), oncology (10.7%), and emergency medicine (6.6%). Regarding clinical tasks, clinical decision support made up the largest proportion of studies (62.2%), while summarizations and patient communications made up the smallest, at 5.6% and 5.1%, respectively. In addition, GPT-4 and GPT-3.5 were the most commonly used generative LLMs, appearing in 60.2% and 57.7% of studies, respectively. Across these studies, we identified 22 unique non-NLP metrics and 35 unique NLP metrics. While NLP metrics offer greater scalability, none demonstrated a strong correlation with gold-standard human evaluations.

Conclusion: Our findings highlight the need to evaluate generative LLMs on EHR data across a broader range of clinical specialties and tasks, as well as the urgent need for standardized, scalable, and clinically meaningful evaluation frameworks.

背景:生成式大型语言模型(llm)与电子健康记录(EHR)数据的使用正在迅速扩展,以支持临床和研究任务。这篇系统的综述描述了迄今为止已经研究和评估的临床领域和用例。方法:我们按照系统评价和荟萃分析的首选报告项目指南对PubMed和Web of Science在2023年1月1日至2024年11月9日期间发表的文章进行了系统评价。如果研究使用生成法学硕士来分析现实世界的电子病历数据和报告的定量绩效评估,则纳入研究。通过数据提取,我们确定了每篇纳入文章的临床专业和任务,并总结了评估方法。结果:在检索到的18735篇文章中,196篇符合我们的标准。大多数研究集中在放射学(26.0%)、肿瘤学(10.7%)和急诊医学(6.6%)。关于临床任务,临床决策支持占研究的最大比例(62.2%),而总结和患者沟通所占比例最小,分别为5.6%和5.1%。此外,GPT-4和GPT-3.5是最常用的生成型LLMs,分别出现在60.2%和57.7%的研究中。在这些研究中,我们确定了22个独特的非NLP指标和35个独特的NLP指标。虽然NLP指标提供了更大的可扩展性,但没有一个显示出与黄金标准的人类评估有很强的相关性。结论:我们的研究结果强调需要在更广泛的临床专业和任务中评估基于EHR数据的生成法学硕士,以及迫切需要标准化、可扩展和临床有意义的评估框架。
{"title":"Testing and evaluation of generative large language models in electronic health record applications: a systematic review.","authors":"Xinsong Du, Zhengyang Zhou, Yifei Wang, Ya-Wen Chuang, Yiming Li, Richard Yang, Wenyu Zhang, Xinyi Wang, Xinyu Chen, Hao Guan, John Lian, Pengyu Hong, David W Bates, Li Zhou","doi":"10.1093/jamia/ocaf233","DOIUrl":"10.1093/jamia/ocaf233","url":null,"abstract":"<p><strong>Background: </strong>The use of generative large language models (LLMs) with electronic health record (EHR) data is rapidly expanding to support clinical and research tasks. This systematic review characterizes the clinical fields and use cases that have been studied and evaluated to date.</p><p><strong>Methods: </strong>We followed the Preferred Reporting Items for Systematic Review and Meta-Analyses guidelines to conduct a systematic review of articles from PubMed and Web of Science published between January 1, 2023, and November 9, 2024. Studies were included if they used generative LLMs to analyze real-world EHR data and reported quantitative performance evaluations. Through data extraction, we identified clinical specialties and tasks for each included article, and summarized evaluation methods.</p><p><strong>Results: </strong>Of the 18 735 articles retrieved, 196 met our criteria. Most studies focused on radiology (26.0%), oncology (10.7%), and emergency medicine (6.6%). Regarding clinical tasks, clinical decision support made up the largest proportion of studies (62.2%), while summarizations and patient communications made up the smallest, at 5.6% and 5.1%, respectively. In addition, GPT-4 and GPT-3.5 were the most commonly used generative LLMs, appearing in 60.2% and 57.7% of studies, respectively. Across these studies, we identified 22 unique non-NLP metrics and 35 unique NLP metrics. While NLP metrics offer greater scalability, none demonstrated a strong correlation with gold-standard human evaluations.</p><p><strong>Conclusion: </strong>Our findings highlight the need to evaluate generative LLMs on EHR data across a broader range of clinical specialties and tasks, as well as the urgent need for standardized, scalable, and clinically meaningful evaluation frameworks.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"743-753"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981627/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond metrics to methods: a scoping review of transformers and large language models for detection of social drivers of health in clinical notes. 从指标到方法:对临床记录中健康社会驱动因素检测的转换器和大型语言模型进行范围审查。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf201
Ahmed Farrag, Ahmed Soliman, Elham Hatef, Amie Goodin, Masoud Rouhizadeh

Objective: This scoping review aimed to (1) map current applications of transformers and large language models (LLMs) for extracting social drivers of health (SDOH) from clinical text, (2) benchmark model performance across SDOH domains, and (3) evaluate methodological rigor to identify research gaps and inform clinical deployment.

Materials and methods: We searched PubMed, Web of Science, Embase, Scopus, and IEEE Xplore for studies applying transformers or LLMs to detect SDOH in clinical narratives. We developed a novel methodological framework integrating (1) hierarchical classification of SDOH domains and transformer/LLM architectures, (2) systematic synthesis of performance metrics, and (3) a 7-domain instrument assessing internal validity, external validity, and reporting transparency.

Results: Forty-two studies met inclusion criteria. Performance varied substantially across SDOH domains. Behavioral Factors achieved the highest median F1-score (0.87), while Health Care Access and Quality showed the lowest performance and greatest variability (median F1 = 0.59). Research concentrated in the United States (85.7%), relied predominantly on private institutional datasets (69%), and focused primarily on critical care populations (45.2%). Methodological assessment revealed critical gaps; only 29% of studies provided annotation guidelines, 24% assessed fairness across demographic groups, and 21% performed external validation.

Discussion: Smaller open-source transformer models show promise for democratizing SDOH detection by achieving competitive performance at lower costs while enabling secure local deployment in resource-limited settings. Advancing clinical readiness requires standardized reporting practices, diverse benchmark datasets across care settings, and systematic equity evaluation to prevent perpetuating health disparities.

Conclusion: Transformer and LLM performance for SDOH detection varied substantially across domains, with encoder-based models excelling at structured tasks and decoder-only models at linguistically complex tasks. Critical gaps in fairness assessment, external validation, and dataset diversity restrict generalizability and readiness for widespread clinical deployment.

目的:本范围综述旨在(1)绘制从临床文本中提取健康社会驱动因素(SDOH)的转换器和大型语言模型(LLMs)的当前应用地图,(2)跨SDOH领域的基准模型性能,以及(3)评估方法的严谨性,以确定研究差距并为临床部署提供信息。材料和方法:我们检索了PubMed、Web of Science、Embase、Scopus和IEEE explore,查找应用transformer或llm检测临床叙述中的SDOH的研究。我们开发了一个新的方法框架,集成了(1)SDOH域和transformer/LLM架构的分层分类,(2)性能指标的系统综合,以及(3)一个评估内部有效性、外部有效性和报告透明度的7域仪器。结果:42项研究符合纳入标准。不同SDOH域的性能差异很大。行为因素的F1中位数得分最高(0.87),而卫生保健可及性和质量的F1中位数得分最低,差异最大(F1中位数= 0.59)。研究集中在美国(85.7%),主要依赖于私人机构数据集(69%),主要关注重症监护人群(45.2%)。方法学评估揭示了关键差距;只有29%的研究提供了注释指南,24%的研究评估了人口统计学群体的公平性,21%的研究进行了外部验证。讨论:较小的开源变压器模型通过在资源有限的环境中实现安全的本地部署,以较低的成本获得具有竞争力的性能,从而展示了民主化SDOH检测的前景。推进临床准备需要标准化的报告实践、跨护理环境的多样化基准数据集以及系统的公平评估,以防止长期存在健康差距。结论:Transformer和LLM用于SDOH检测的性能在不同领域差异很大,基于编码器的模型在结构化任务中表现出色,而仅解码器的模型在语言复杂的任务中表现出色。公平性评估、外部验证和数据集多样性方面的关键差距限制了推广和广泛临床部署的准备。
{"title":"Beyond metrics to methods: a scoping review of transformers and large language models for detection of social drivers of health in clinical notes.","authors":"Ahmed Farrag, Ahmed Soliman, Elham Hatef, Amie Goodin, Masoud Rouhizadeh","doi":"10.1093/jamia/ocaf201","DOIUrl":"10.1093/jamia/ocaf201","url":null,"abstract":"<p><strong>Objective: </strong>This scoping review aimed to (1) map current applications of transformers and large language models (LLMs) for extracting social drivers of health (SDOH) from clinical text, (2) benchmark model performance across SDOH domains, and (3) evaluate methodological rigor to identify research gaps and inform clinical deployment.</p><p><strong>Materials and methods: </strong>We searched PubMed, Web of Science, Embase, Scopus, and IEEE Xplore for studies applying transformers or LLMs to detect SDOH in clinical narratives. We developed a novel methodological framework integrating (1) hierarchical classification of SDOH domains and transformer/LLM architectures, (2) systematic synthesis of performance metrics, and (3) a 7-domain instrument assessing internal validity, external validity, and reporting transparency.</p><p><strong>Results: </strong>Forty-two studies met inclusion criteria. Performance varied substantially across SDOH domains. Behavioral Factors achieved the highest median F1-score (0.87), while Health Care Access and Quality showed the lowest performance and greatest variability (median F1 = 0.59). Research concentrated in the United States (85.7%), relied predominantly on private institutional datasets (69%), and focused primarily on critical care populations (45.2%). Methodological assessment revealed critical gaps; only 29% of studies provided annotation guidelines, 24% assessed fairness across demographic groups, and 21% performed external validation.</p><p><strong>Discussion: </strong>Smaller open-source transformer models show promise for democratizing SDOH detection by achieving competitive performance at lower costs while enabling secure local deployment in resource-limited settings. Advancing clinical readiness requires standardized reporting practices, diverse benchmark datasets across care settings, and systematic equity evaluation to prevent perpetuating health disparities.</p><p><strong>Conclusion: </strong>Transformer and LLM performance for SDOH detection varied substantially across domains, with encoder-based models excelling at structured tasks and decoder-only models at linguistically complex tasks. Critical gaps in fairness assessment, external validation, and dataset diversity restrict generalizability and readiness for widespread clinical deployment.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"754-769"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compliance and factuality of large language models for clinical research document generation. 临床研究文件生成的大型语言模型的遵从性和真实性。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf174
Zifeng Wang, Junyi Gao, Benjamin Danek, Brandon Theodorou, Ruba Shaik, Shivashankar Thati, Seunghyun Won, Jimeng Sun

Objectives: Large language models' (LLMs') performance in high-stakes, compliance-driven settings such as drafting clinical research documents remains underexplored. This study aims to build a benchmark and an evaluation framework for assessing LLMs' compliance and factuality in generating informed consent forms (ICFs) from clinical trial protocols.

Materials and methods: We introduce InformBench, a benchmark comprising 900 clinical trial documents, and propose an evaluation framework grounded in regulatory guidelines and site-specific consent templates. We assess LLM performance on transforming trial protocols, often hundreds of pages, into concise, patient-facing ICFs. Additionally, we design InformGen, a retrieval-augmented, human-in-the-loop pipeline aimed at improving generation quality.

Results: Baseline LLMs such as GPT-4o achieved only 70%-80% compliance and exhibited factual errors in 18%-43% of cases. In contrast, InformGen substantially improved outputs, achieving nearly 100% regulatory compliance and over 90% factual accuracy, as validated by 5 domain-expert annotators.

Discussion: The study reveals critical limitations in current LLMs for clinical research document drafting, particularly in regulatory sensitivity and factual grounding. Our results highlight the need for domain-specific benchmarks and structured evaluations to support safe deployment in real-world clinical research workflows.

Conclusion: LLMs offer value in clinical research document generation but must be adapted and rigorously evaluated for high-stakes applications. Our benchmark and framework provide a foundation for improving and assessing LLM-generated outputs in compliance-critical domains.

目的:大型语言模型(llm)在高风险、合规性驱动的环境(如起草临床研究文件)中的表现仍未得到充分探索。本研究旨在建立一个基准和评估框架,以评估法学硕士在根据临床试验方案生成知情同意书(icf)时的合规性和真实性。材料和方法:我们介绍了InformBench,这是一个由900个临床试验文件组成的基准,并提出了一个基于监管指南和特定地点同意模板的评估框架。我们评估法学硕士在将试验方案(通常是数百页)转化为简明的、面向患者的icf方面的表现。此外,我们还设计了InformGen,这是一个检索增强的、人在循环的管道,旨在提高发电质量。结果:基线LLMs如gpt - 40仅达到70%-80%的依从性,并在18%-43%的病例中出现事实错误。相比之下,InformGen大大提高了输出,实现了近100%的法规遵从性和超过90%的事实准确性,由5个领域专家注释者验证。讨论:该研究揭示了当前法学硕士在临床研究文件起草方面的关键局限性,特别是在监管敏感性和事实基础方面。我们的研究结果强调了对特定领域的基准和结构化评估的需求,以支持在现实世界的临床研究工作流程中的安全部署。结论:法学硕士在临床研究文件生成中提供价值,但必须适应并严格评估高风险应用。我们的基准和框架为在合规关键领域改进和评估法学硕士生成的输出提供了基础。
{"title":"Compliance and factuality of large language models for clinical research document generation.","authors":"Zifeng Wang, Junyi Gao, Benjamin Danek, Brandon Theodorou, Ruba Shaik, Shivashankar Thati, Seunghyun Won, Jimeng Sun","doi":"10.1093/jamia/ocaf174","DOIUrl":"10.1093/jamia/ocaf174","url":null,"abstract":"<p><strong>Objectives: </strong>Large language models' (LLMs') performance in high-stakes, compliance-driven settings such as drafting clinical research documents remains underexplored. This study aims to build a benchmark and an evaluation framework for assessing LLMs' compliance and factuality in generating informed consent forms (ICFs) from clinical trial protocols.</p><p><strong>Materials and methods: </strong>We introduce InformBench, a benchmark comprising 900 clinical trial documents, and propose an evaluation framework grounded in regulatory guidelines and site-specific consent templates. We assess LLM performance on transforming trial protocols, often hundreds of pages, into concise, patient-facing ICFs. Additionally, we design InformGen, a retrieval-augmented, human-in-the-loop pipeline aimed at improving generation quality.</p><p><strong>Results: </strong>Baseline LLMs such as GPT-4o achieved only 70%-80% compliance and exhibited factual errors in 18%-43% of cases. In contrast, InformGen substantially improved outputs, achieving nearly 100% regulatory compliance and over 90% factual accuracy, as validated by 5 domain-expert annotators.</p><p><strong>Discussion: </strong>The study reveals critical limitations in current LLMs for clinical research document drafting, particularly in regulatory sensitivity and factual grounding. Our results highlight the need for domain-specific benchmarks and structured evaluations to support safe deployment in real-world clinical research workflows.</p><p><strong>Conclusion: </strong>LLMs offer value in clinical research document generation but must be adapted and rigorously evaluated for high-stakes applications. Our benchmark and framework provide a foundation for improving and assessing LLM-generated outputs in compliance-critical domains.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"563-572"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital interdependence: impact of work spillover during clinical team handoffs. 数字化相互依赖:临床团队交接过程中工作溢出的影响。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf212
Dori A Cross, Josh Weiner, Hannah T Neprash, Genevieve B Melton, Andrew Olson

Objective: To characterize the nature and consequence(s) of interdependent physician electronic health record (EHR) work across inpatient shifts.

Materials and methods: Pooled cross-sectional analysis of EHR metadata associated with hospital medicine patients at an academic medical center, January-June 2022. Using patient-day observation data, we use a mixed effects regression model with daytime physician random effects to examine nightshift behavior (handoff time, total EHR time) as a function of behaviors by the preceding daytime team. We also assess whether nighttime patient deterioration is predicted by team coordination behaviors across shifts.

Results: We observed 19 671 patient days (N = 2708 encounters). Physicians used the handoff tool consistently, generally spending 8-12 minutes per shift editing patient information. When the day service team was more activated (highest tercile of handoff time, overall EHR time), nightshift experienced increased levels of EHR work and patient risk of overnight decline was elevated. (ie, Busy predicts busy). However, lower levels of dayshift activation were also associated with nightshift spillovers, including higher overnight EHR work and increased likelihood of patient clinical decline. Patient-days in the lowest and highest terciles of dayshift EHR time had a 1 percentage point increased relative risk of overnight decline (baseline prevalence of 4.4%) compared to the middle tercile (P = .04).

Discussion: We find evidence of spillovers in EHR work from dayshift to nightshift. Additionally, the lowest and highest levels of dayshift EHR activity are associated with increased risk of overnight patient decline. Results are associational and motivate further examination of additional confounding factors.

Conclusion: Analyses reveal opportunities to address task interdependence across shifts, using technology to flexibly shape and support collaborative teaming practices in complex clinical environments.

目的:描述住院病人轮班相互依赖的医生电子健康记录(EHR)工作的性质和后果。材料和方法:汇总横断面分析与某学术医疗中心医院内科患者相关的EHR元数据,时间为2022年1 - 6月。使用患者日观察数据,我们使用混合效应回归模型与日间医生随机效应来检验夜班行为(交接时间,总电子病历时间)作为前日间团队行为的函数。我们还评估了夜间患者病情恶化是否可以通过跨班次的团队协调行为来预测。结果:共观察19 671患者日(N = 2708次就诊)。医生始终如一地使用交接工具,通常每班花费8-12分钟编辑患者信息。当日间服务团队更活跃时(最高的交接时间,整体电子病历时间),夜班的电子病历工作水平增加,患者夜间下降的风险增加。(例如,忙预示着忙)。然而,较低的白班激活水平也与夜班溢出效应有关,包括较高的夜间电子病历工作和患者临床衰退的可能性增加。与中间时段相比,白班EHR时间最低和最高时段的患者日夜间下降的相对风险增加了1个百分点(基线患病率为4.4%)(P = 0.04)。讨论:我们发现了从白班到夜班的电子病历工作溢出的证据。此外,最低和最高水平的白班电子病历活动与夜间患者下降的风险增加有关。结果是相关的,并激励进一步检查其他混杂因素。结论:分析揭示了解决跨班次任务相互依赖的机会,利用技术灵活地塑造和支持复杂临床环境中的协作团队实践。
{"title":"Digital interdependence: impact of work spillover during clinical team handoffs.","authors":"Dori A Cross, Josh Weiner, Hannah T Neprash, Genevieve B Melton, Andrew Olson","doi":"10.1093/jamia/ocaf212","DOIUrl":"10.1093/jamia/ocaf212","url":null,"abstract":"<p><strong>Objective: </strong>To characterize the nature and consequence(s) of interdependent physician electronic health record (EHR) work across inpatient shifts.</p><p><strong>Materials and methods: </strong>Pooled cross-sectional analysis of EHR metadata associated with hospital medicine patients at an academic medical center, January-June 2022. Using patient-day observation data, we use a mixed effects regression model with daytime physician random effects to examine nightshift behavior (handoff time, total EHR time) as a function of behaviors by the preceding daytime team. We also assess whether nighttime patient deterioration is predicted by team coordination behaviors across shifts.</p><p><strong>Results: </strong>We observed 19 671 patient days (N = 2708 encounters). Physicians used the handoff tool consistently, generally spending 8-12 minutes per shift editing patient information. When the day service team was more activated (highest tercile of handoff time, overall EHR time), nightshift experienced increased levels of EHR work and patient risk of overnight decline was elevated. (ie, Busy predicts busy). However, lower levels of dayshift activation were also associated with nightshift spillovers, including higher overnight EHR work and increased likelihood of patient clinical decline. Patient-days in the lowest and highest terciles of dayshift EHR time had a 1 percentage point increased relative risk of overnight decline (baseline prevalence of 4.4%) compared to the middle tercile (P = .04).</p><p><strong>Discussion: </strong>We find evidence of spillovers in EHR work from dayshift to nightshift. Additionally, the lowest and highest levels of dayshift EHR activity are associated with increased risk of overnight patient decline. Results are associational and motivate further examination of additional confounding factors.</p><p><strong>Conclusion: </strong>Analyses reveal opportunities to address task interdependence across shifts, using technology to flexibly shape and support collaborative teaming practices in complex clinical environments.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"603-610"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel analysis methodology for assessment of re-identification risks for the National Cancer Institute cancer registry privacy preserving record linkage technique. 一种新的分析方法,用于评估国家癌症研究所癌症登记隐私保护记录链接技术的重新识别风险。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf172
Murat Kantarcioglu, Will Howe, Benmei Liu, Valentina Petkov, Esmeralda Casas-Silva, Diana Velasquez-Kolnik, Bradley A Malin, Lynne Penberthy

Objective: The National Cancer Institute (NCI), part of the National Institutes of Health (NIH) supports efforts to address critical challenges in advancing cancer research. As part of this effort, NCI sponsored the development of a privacy-preserving record linkage (PPRL) software that transforms identifying patient information into multiple tokens through a set of cryptographically secure keyed hash functions. This project aims to evaluate the PPRL software in the perspective of re-identification risks and propose effective strategies to sufficiently mitigate these risks.

Materials and methods: To achieve the goals, we developed a novel re-identification risk assessment framework, based on token frequency analysis, to estimate the privacy impact of hashed tokens shared for record linkage. We assessed privacy risk through empirical analysis on a state-level voter registration database, a public dataset commonly used for re-identification, under various scenarios. These scenarios are defined based on several factors, including the size of the dataset used for linkage and a group size parameter that determines when an adversary can claim that a record has been re-identified.

Results: We found that the re-identification risk based on frequency analysis attack is approximately 0.0002 (ie, 2 patients out of 10 000 are potentially identifiable) under reasonable adversarial settings, with a group size parameter of k = 12 and a dataset size of 400 000 patients. Additionally, our analysis reveals a negative correlation between dataset size and re-identification risk.

Discussion: Re-identification risk is deemed low for the new NCI PPRL software. Token frequency analysis provides a reliable estimate of the re-identification risk in token-based PPRL tools.

国家癌症研究所(NCI)是美国国立卫生研究院(NIH)的一部分,致力于解决推进癌症研究的关键挑战。作为这项工作的一部分,NCI赞助了一种隐私保护记录链接(PPRL)软件的开发,该软件通过一组加密安全密钥散列函数将识别患者信息转换为多个令牌。本项目旨在从重新识别风险的角度对PPRL软件进行评估,并提出有效的策略来充分减轻这些风险。材料和方法:为了实现目标,我们基于令牌频率分析开发了一种新的重新识别风险评估框架,以估计为记录链接共享的散列令牌对隐私的影响。我们通过对一个州级选民登记数据库(一个通常用于重新识别的公共数据集)在不同场景下的实证分析来评估隐私风险。这些场景是根据几个因素来定义的,包括用于链接的数据集的大小和一个组大小参数,该参数决定攻击者何时可以声称一条记录已被重新识别。结果:我们发现,在合理的对抗设置下,基于频率分析攻击的重新识别风险约为0.0002(即10000名患者中有2名患者可能被识别),组大小参数k = 12,数据集大小为40万例患者。此外,我们的分析揭示了数据集大小与重新识别风险之间的负相关关系。讨论:对于新的NCI PPRL软件,重新识别风险被认为很低。令牌频率分析为基于令牌的PPRL工具中的重新识别风险提供了可靠的估计。
{"title":"A novel analysis methodology for assessment of re-identification risks for the National Cancer Institute cancer registry privacy preserving record linkage technique.","authors":"Murat Kantarcioglu, Will Howe, Benmei Liu, Valentina Petkov, Esmeralda Casas-Silva, Diana Velasquez-Kolnik, Bradley A Malin, Lynne Penberthy","doi":"10.1093/jamia/ocaf172","DOIUrl":"10.1093/jamia/ocaf172","url":null,"abstract":"<p><strong>Objective: </strong>The National Cancer Institute (NCI), part of the National Institutes of Health (NIH) supports efforts to address critical challenges in advancing cancer research. As part of this effort, NCI sponsored the development of a privacy-preserving record linkage (PPRL) software that transforms identifying patient information into multiple tokens through a set of cryptographically secure keyed hash functions. This project aims to evaluate the PPRL software in the perspective of re-identification risks and propose effective strategies to sufficiently mitigate these risks.</p><p><strong>Materials and methods: </strong>To achieve the goals, we developed a novel re-identification risk assessment framework, based on token frequency analysis, to estimate the privacy impact of hashed tokens shared for record linkage. We assessed privacy risk through empirical analysis on a state-level voter registration database, a public dataset commonly used for re-identification, under various scenarios. These scenarios are defined based on several factors, including the size of the dataset used for linkage and a group size parameter that determines when an adversary can claim that a record has been re-identified.</p><p><strong>Results: </strong>We found that the re-identification risk based on frequency analysis attack is approximately 0.0002 (ie, 2 patients out of 10 000 are potentially identifiable) under reasonable adversarial settings, with a group size parameter of k = 12 and a dataset size of 400 000 patients. Additionally, our analysis reveals a negative correlation between dataset size and re-identification risk.</p><p><strong>Discussion: </strong>Re-identification risk is deemed low for the new NCI PPRL software. Token frequency analysis provides a reliable estimate of the re-identification risk in token-based PPRL tools.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"663-669"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable machine learning for identifying ICU readmission risk in subgroups with probabilistic rules. 基于概率规则的可解释机器学习识别ICU再入院风险。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf171
Lincen Yang, Siri L van der Meijden, Sesmu M Arbous, Matthijs van Leeuwen

Objective: Estimating readmission risk for intensive care unit (ICU) patients is critical for clinicians to optimize resource allocation and prevent premature discharges. Machine learning models currently applied to this task either lack interpretability or cannot identify patient subgroups with distinctive readmission risks and characteristics. We addressed this gap by introducing a cutting-edge rule-based model, namely truly unordered rule sets (TURS), to reveal heterogeneous readmission risks and subgroup-level patient characteristics.

Materials and methods: We trained TURS on all ICU admissions from January 2011 to January 2020 at Leiden University Medical Center. For each subgroup, patient characteristics and the influence of feature variables on readmission risk were analyzed.

Results: TURS identified subgroups with heterogeneous feature distributions and feature importance, providing actionable insights for ICU discharge planning. Its predictive performance (area under the receiver operating characteristic curve [ROC-AUC] 70.5%) and model complexity (5 rules, average length 2) surpassed other rule-based models.

Discussion: Subgroup analysis highlighted the heterogeneity of patients. First, we compared the conditional probability distribution of each feature variable, conditioned on the fact that a patient was in a certain subgroup, with its unconditional distribution. We identified features deviating from the unconditional distribution, illustrating unique subgroup-specific implications. Furthermore, we demonstrated that features with the highest impact on the readmission risk were distinctive for each subgroup.

Conclusion: The TURS model provided a concise summary of patient subgroups, aiding ICU discharge decisions and advancing knowledge discovery in the ICU.

目的:评估重症监护病房(ICU)患者再入院风险对临床医生优化资源配置和预防过早出院至关重要。目前应用于该任务的机器学习模型要么缺乏可解释性,要么无法识别具有独特再入院风险和特征的患者亚组。我们通过引入一种前沿的基于规则的模型,即真正无序规则集(TURS)来解决这一差距,以揭示异质性再入院风险和亚组水平的患者特征。材料和方法:我们对2011年1月至2020年1月莱顿大学医学中心所有ICU入院患者进行TURS培训。分析各亚组患者特征及特征变量对再入院风险的影响。结果:TURS识别出具有异质特征分布和特征重要性的亚组,为ICU出院计划提供可操作的见解。其预测性能(受试者工作特征曲线下面积[ROC-AUC] 70.5%)和模型复杂度(5条规则,平均长度2)均优于其他基于规则的模型。讨论:亚组分析强调了患者的异质性。首先,我们将每个特征变量的条件概率分布(以患者属于某一亚组为条件)与其无条件分布进行比较。我们确定了偏离无条件分布的特征,说明了独特的亚群特定含义。此外,我们证明了对再入院风险影响最大的特征在每个亚组中都是不同的。结论:TURS模型提供了患者亚组的简明总结,有助于ICU出院决策和推进ICU的知识发现。
{"title":"Interpretable machine learning for identifying ICU readmission risk in subgroups with probabilistic rules.","authors":"Lincen Yang, Siri L van der Meijden, Sesmu M Arbous, Matthijs van Leeuwen","doi":"10.1093/jamia/ocaf171","DOIUrl":"10.1093/jamia/ocaf171","url":null,"abstract":"<p><strong>Objective: </strong>Estimating readmission risk for intensive care unit (ICU) patients is critical for clinicians to optimize resource allocation and prevent premature discharges. Machine learning models currently applied to this task either lack interpretability or cannot identify patient subgroups with distinctive readmission risks and characteristics. We addressed this gap by introducing a cutting-edge rule-based model, namely truly unordered rule sets (TURS), to reveal heterogeneous readmission risks and subgroup-level patient characteristics.</p><p><strong>Materials and methods: </strong>We trained TURS on all ICU admissions from January 2011 to January 2020 at Leiden University Medical Center. For each subgroup, patient characteristics and the influence of feature variables on readmission risk were analyzed.</p><p><strong>Results: </strong>TURS identified subgroups with heterogeneous feature distributions and feature importance, providing actionable insights for ICU discharge planning. Its predictive performance (area under the receiver operating characteristic curve [ROC-AUC] 70.5%) and model complexity (5 rules, average length 2) surpassed other rule-based models.</p><p><strong>Discussion: </strong>Subgroup analysis highlighted the heterogeneity of patients. First, we compared the conditional probability distribution of each feature variable, conditioned on the fact that a patient was in a certain subgroup, with its unconditional distribution. We identified features deviating from the unconditional distribution, illustrating unique subgroup-specific implications. Furthermore, we demonstrated that features with the highest impact on the readmission risk were distinctive for each subgroup.</p><p><strong>Conclusion: </strong>The TURS model provided a concise summary of patient subgroups, aiding ICU discharge decisions and advancing knowledge discovery in the ICU.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"690-699"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145394646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Response to "toward semantic interoperability of imaging and clinical data: reflections on the DICOM-OMOP integration framework". 对“成像和临床数据的语义互操作性:对DICOM-OMOP集成框架的思考”的回应。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf216
Woo Yeon Park, Teri Sippel Schmidt, Gabriel Salvador, Kevin O'Donnell, Brad Genereaux, Kyulee Jeon, Seng Chan You, Blake E Dewey, Paul Nagy
{"title":"Response to \"toward semantic interoperability of imaging and clinical data: reflections on the DICOM-OMOP integration framework\".","authors":"Woo Yeon Park, Teri Sippel Schmidt, Gabriel Salvador, Kevin O'Donnell, Brad Genereaux, Kyulee Jeon, Seng Chan You, Blake E Dewey, Paul Nagy","doi":"10.1093/jamia/ocaf216","DOIUrl":"10.1093/jamia/ocaf216","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"776-778"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145844527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GARDE-Chat: a scalable, open-source platform for building and deploying health chatbots. 一个可扩展的开源平台,用于构建和部署健康聊天机器人。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf211
Guilherme Del Fiol, Emerson Borsato, Richard L Bradshaw, Jiantao Bian, Alana Woodbury, Courtney Gauchel, Karen L Eilbeck, Whitney Maxwell, Kelsey Ellis, Anne C Madeo, Chelsey Schlechter, Polina V Kukhareva, Caitlin G Allen, Michael Kean, Elena B Elkin, Ravi Sharaf, Muhammad D Ahsan, Melissa Frey, Lauren Davis-Rivera, Wendy K Kohlmann, David W Wetter, Kimberly A Kaphingst, Kensaku Kawamoto

Background: Chatbots are increasingly used to deliver health education, patient engagement, and access to healthcare services. GARDE-Chat is an open-source platform designed to facilitate the development, deployment, and dissemination of chatbot-based digital health interventions across different domains and settings.

Materials and methods: GARDE-Chat was developed through an iterative process informed by real-world use cases to guide prioritization of key features. The tool was developed as an open-source platform to promote collaboration, broad dissemination, and impact across research and clinical domains.

Results: GARDE-Chat's main features include (1) a visual authoring interface that allows non-programmers to design chatbots; (2) support for scripted, large language model (LLM)-based and hybrid chatbots; (3) capacity to share chatbots with researchers and institutions; (4) integration with external applications and data sources such as electronic health records and REDCap; (5) delivery via web browsers or text messaging; and (6) detailed audit log supporting analyses of chatbot user interactions. Since its first release in July 2022, GARDE-Chat has supported the development of chatbot-based interventions tested in multiple studies, including large pragmatic clinical trials addressing topics such as genetic testing, COVID-19 testing, tobacco cessation, and cancer screening.

Discussion: Ongoing challenges include the effort required for developing chatbot scripts, ensuring safe use of LLMs, and integrating with clinical systems.

Conclusion: GARDE-Chat is a generalizable platform for creating, implementing, and disseminating scalable chatbot-based population health interventions. It has been validated in several studies, and it is available to researchers and healthcare systems through an open-source mechanism.

背景:聊天机器人越来越多地用于提供健康教育、患者参与和获得医疗保健服务。GARDE-Chat是一个开源平台,旨在促进基于聊天机器人的数字健康干预措施在不同领域和环境中的开发、部署和传播。材料和方法:gard - chat是通过一个迭代过程开发的,该过程由现实世界的用例提供信息,以指导关键功能的优先级。该工具是作为一个开源平台开发的,旨在促进跨研究和临床领域的协作、广泛传播和影响。结果:gard - chat的主要功能包括:(1)一个可视化的创作界面,允许非程序员设计聊天机器人;(2)支持脚本化、基于大语言模型(LLM)和混合聊天机器人;(3)与研究人员和机构共享聊天机器人的能力;(4)与外部应用程序和数据源集成,如电子健康记录和REDCap;(5)通过网页浏览器或短信传送;(6)支持聊天机器人用户交互分析的详细审计日志。自2022年7月首次发布以来,gard - chat已支持基于聊天机器人的干预措施的开发,并在多项研究中进行了测试,包括针对基因检测、COVID-19检测、戒烟和癌症筛查等主题的大型实用临床试验。讨论:正在进行的挑战包括开发聊天机器人脚本所需的努力,确保llm的安全使用,以及与临床系统的集成。结论:gard - chat是一个创建、实施和传播可扩展的基于聊天机器人的人口健康干预的通用平台。它已经在几项研究中得到验证,并且通过开源机制可供研究人员和医疗保健系统使用。
{"title":"GARDE-Chat: a scalable, open-source platform for building and deploying health chatbots.","authors":"Guilherme Del Fiol, Emerson Borsato, Richard L Bradshaw, Jiantao Bian, Alana Woodbury, Courtney Gauchel, Karen L Eilbeck, Whitney Maxwell, Kelsey Ellis, Anne C Madeo, Chelsey Schlechter, Polina V Kukhareva, Caitlin G Allen, Michael Kean, Elena B Elkin, Ravi Sharaf, Muhammad D Ahsan, Melissa Frey, Lauren Davis-Rivera, Wendy K Kohlmann, David W Wetter, Kimberly A Kaphingst, Kensaku Kawamoto","doi":"10.1093/jamia/ocaf211","DOIUrl":"10.1093/jamia/ocaf211","url":null,"abstract":"<p><strong>Background: </strong>Chatbots are increasingly used to deliver health education, patient engagement, and access to healthcare services. GARDE-Chat is an open-source platform designed to facilitate the development, deployment, and dissemination of chatbot-based digital health interventions across different domains and settings.</p><p><strong>Materials and methods: </strong>GARDE-Chat was developed through an iterative process informed by real-world use cases to guide prioritization of key features. The tool was developed as an open-source platform to promote collaboration, broad dissemination, and impact across research and clinical domains.</p><p><strong>Results: </strong>GARDE-Chat's main features include (1) a visual authoring interface that allows non-programmers to design chatbots; (2) support for scripted, large language model (LLM)-based and hybrid chatbots; (3) capacity to share chatbots with researchers and institutions; (4) integration with external applications and data sources such as electronic health records and REDCap; (5) delivery via web browsers or text messaging; and (6) detailed audit log supporting analyses of chatbot user interactions. Since its first release in July 2022, GARDE-Chat has supported the development of chatbot-based interventions tested in multiple studies, including large pragmatic clinical trials addressing topics such as genetic testing, COVID-19 testing, tobacco cessation, and cancer screening.</p><p><strong>Discussion: </strong>Ongoing challenges include the effort required for developing chatbot scripts, ensuring safe use of LLMs, and integrating with clinical systems.</p><p><strong>Conclusion: </strong>GARDE-Chat is a generalizable platform for creating, implementing, and disseminating scalable chatbot-based population health interventions. It has been validated in several studies, and it is available to researchers and healthcare systems through an open-source mechanism.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"593-602"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798686/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145953525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1