首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Knowledge graph-augmented large language models for reconstructing life course risk pathways: a gestational diabetes mellitus-to-dementia case study. 重建生命历程风险路径的知识图谱增强大语言模型:一个妊娠期糖尿病到痴呆的案例研究。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-16 DOI: 10.1093/jamia/ocaf219
Shuang Wang, Yang Zhang, Ying Gao, Xin He, Guanghui Deng, Jian Du

Objectives: To develop and evaluate a knowledge graph-augmented large language model (LLM) framework that synthesizes epidemiological evidence to infer life-course exposure-outcome pathways, using gestational diabetes mellitus (GDM) and dementia as a case study.

Materials and methods: We constructed a causal knowledge graph by extracting empirical epidemiological associations from scientific literature, excluding hypothetical assertions. The graph was integrated with GPT-4 through four graph retrieval-augmented generation (GRAG) strategies to infer bridging variables between early-life exposure (GDM) and later-life outcome (dementia). Semantic triples served as structured inputs to support LLM reasoning. Each GRAG strategy was evaluated by human clinical experts and three LLM-based reviewers (GPT-4o, Llama 3-70B, and Gemini Advanced), assessing scientific reliability, novelty, and clinical relevance.

Results: The GRAG strategy using a minimal set of abstracts specifically related to GDM-dementia bridging variables performed comparably to the strategy using broader sub-community abstracts, and both significantly outperformed approaches using the full GDM- or dementia-related corpus or baseline GPT-4 without external augmentation. The knowledge graph-augmented LLM identified 108 maternal candidate mediators, including validated risk factors such as chronic kidney disease and physical inactivity. The structured approach improved accuracy and reduced confabulation compared to standard LLM outputs.

Discussion: Our findings suggest that augmenting LLMs with epidemiological knowledge graphs enables effective reasoning over fragmented literature and supports the reconstruction of progressive risk pathways. Expert assessments revealed that LLMs may overestimate clinical relevance, highlighting the need for human-AI collaboration in interpretation and application.

Conclusion: Integrating semantic epidemiological knowledge with LLMs via GRAG strategies provides a promising framework for life-course epidemiology, enabling early detection of modifiable risk factors and guiding variable selection in cohort study design.

目的:以妊娠期糖尿病(GDM)和痴呆为例,开发和评估一个知识图增强大语言模型(LLM)框架,该框架综合流行病学证据来推断生命过程暴露-结局途径。材料和方法:我们通过从科学文献中提取经验流行病学关联来构建因果知识图,排除假设断言。该图表通过四种图表检索增强生成(GRAG)策略与GPT-4整合,以推断早期生活暴露(GDM)和后期生活结果(痴呆)之间的桥接变量。语义三元组作为结构化输入来支持LLM推理。每个GRAG策略由人类临床专家和三位基于法学硕士的评审员(gpt - 40、Llama 3-70B和Gemini Advanced)评估,评估科学可靠性、新颖性和临床相关性。结果:使用与GDM-痴呆桥接变量相关的最小摘要集的GRAG策略与使用更广泛的子社区摘要的策略表现相当,并且两者都明显优于使用完整GDM或痴呆相关语料库或基线GPT-4的方法,没有外部增强。知识图谱增强的LLM确定了108个母体候选介质,包括经过验证的风险因素,如慢性肾脏疾病和缺乏身体活动。与标准LLM输出相比,结构化方法提高了精度并减少了虚构。讨论:我们的研究结果表明,用流行病学知识图谱增加llm可以对碎片化的文献进行有效的推理,并支持渐进风险路径的重建。专家评估显示,法学硕士可能高估了临床相关性,强调了在解释和应用方面需要人类与人工智能合作。结论:通过GRAG策略将语义流行病学知识与llm整合为生命过程流行病学提供了一个有希望的框架,可以早期发现可改变的危险因素,并指导队列研究设计中的变量选择。
{"title":"Knowledge graph-augmented large language models for reconstructing life course risk pathways: a gestational diabetes mellitus-to-dementia case study.","authors":"Shuang Wang, Yang Zhang, Ying Gao, Xin He, Guanghui Deng, Jian Du","doi":"10.1093/jamia/ocaf219","DOIUrl":"https://doi.org/10.1093/jamia/ocaf219","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and evaluate a knowledge graph-augmented large language model (LLM) framework that synthesizes epidemiological evidence to infer life-course exposure-outcome pathways, using gestational diabetes mellitus (GDM) and dementia as a case study.</p><p><strong>Materials and methods: </strong>We constructed a causal knowledge graph by extracting empirical epidemiological associations from scientific literature, excluding hypothetical assertions. The graph was integrated with GPT-4 through four graph retrieval-augmented generation (GRAG) strategies to infer bridging variables between early-life exposure (GDM) and later-life outcome (dementia). Semantic triples served as structured inputs to support LLM reasoning. Each GRAG strategy was evaluated by human clinical experts and three LLM-based reviewers (GPT-4o, Llama 3-70B, and Gemini Advanced), assessing scientific reliability, novelty, and clinical relevance.</p><p><strong>Results: </strong>The GRAG strategy using a minimal set of abstracts specifically related to GDM-dementia bridging variables performed comparably to the strategy using broader sub-community abstracts, and both significantly outperformed approaches using the full GDM- or dementia-related corpus or baseline GPT-4 without external augmentation. The knowledge graph-augmented LLM identified 108 maternal candidate mediators, including validated risk factors such as chronic kidney disease and physical inactivity. The structured approach improved accuracy and reduced confabulation compared to standard LLM outputs.</p><p><strong>Discussion: </strong>Our findings suggest that augmenting LLMs with epidemiological knowledge graphs enables effective reasoning over fragmented literature and supports the reconstruction of progressive risk pathways. Expert assessments revealed that LLMs may overestimate clinical relevance, highlighting the need for human-AI collaboration in interpretation and application.</p><p><strong>Conclusion: </strong>Integrating semantic epidemiological knowledge with LLMs via GRAG strategies provides a promising framework for life-course epidemiology, enabling early detection of modifiable risk factors and guiding variable selection in cohort study design.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patient perspectives on gender identity and anatomy data collection in electronic health records: a qualitative study. 患者对电子健康记录中性别认同和解剖数据收集的看法:一项定性研究。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-11 DOI: 10.1093/jamia/ocaf205
Samuel Dubin, Gabrielle Mayer, Nishant Pradhan, Madeline Xin, Richard Greene

Objectives: Documentation of gender identity (GI) and anatomy data in the electronic health record (EHR) is a proposed standard of care for transgender populations. However, there is limited research on implementation of proposed best practices, particularly anatomy data collection. This study aims to characterize factors that influence patient preferences and comfort around the collection and documentation of GI and anatomy in EHRs.

Materials and methods: From November 2023 to January 2024, 17 one-on-one, semi-structured virtual interviews were conducted with transgender adults residing in the Metropolitan New York area. Transcriptions were analyzed using inductive thematic analysis.

Results: Themes clustered around comfort and preferences for data collection processes and outcomes. Factors that influenced preferences and comfort around anatomy data were distinct from those impacting GI documentation preferences and comfort. The tension between the categories of GI and sex assigned at birth impacted anatomy data documentation preferences. Clinical context emerged as a consistent factor that impacts both preferences and comfort of GI and anatomy data documentation.

Discussion and conclusion: GI data collection efforts in clinical settings must consider the implication of anatomy data collection when determining data collection best practice methodologies. Anticipated and experienced stigma remain significant hurdles to patient comfort and willingness to collect GI and anatomy data, and their impact on actual data collection should be further elucidated among diverse gender identities. Clinical data collection methods, tools, and education warrant ongoing research investment to further elucidate best practices.

目的:在电子健康记录(EHR)中记录性别认同(GI)和解剖数据是跨性别人群的拟议护理标准。然而,关于实施建议的最佳实践的研究有限,特别是解剖数据收集。本研究旨在描述影响患者在电子病历中收集和记录GI和解剖的偏好和舒适度的因素。材料与方法:从2023年11月至2024年1月,对居住在纽约大都会地区的跨性别成年人进行了17次一对一、半结构化的虚拟访谈。采用归纳主题性分析对转录进行分析。结果:主题集中在舒适度和偏好的数据收集过程和结果。影响解剖学数据偏好和舒适度的因素与影响GI文档偏好和舒适度的因素不同。GI分类和出生时性别分配之间的紧张关系影响了解剖学数据记录的偏好。临床环境成为影响GI和解剖数据文档偏好和舒适度的一致因素。讨论和结论:在确定数据收集最佳实践方法时,临床环境中的GI数据收集工作必须考虑解剖学数据收集的含义。预期和经历的耻辱感仍然是患者舒适度和收集GI和解剖数据意愿的重大障碍,它们对实际数据收集的影响应在不同性别认同中进一步阐明。临床数据收集方法、工具和教育需要持续的研究投资,以进一步阐明最佳实践。
{"title":"Patient perspectives on gender identity and anatomy data collection in electronic health records: a qualitative study.","authors":"Samuel Dubin, Gabrielle Mayer, Nishant Pradhan, Madeline Xin, Richard Greene","doi":"10.1093/jamia/ocaf205","DOIUrl":"https://doi.org/10.1093/jamia/ocaf205","url":null,"abstract":"<p><strong>Objectives: </strong>Documentation of gender identity (GI) and anatomy data in the electronic health record (EHR) is a proposed standard of care for transgender populations. However, there is limited research on implementation of proposed best practices, particularly anatomy data collection. This study aims to characterize factors that influence patient preferences and comfort around the collection and documentation of GI and anatomy in EHRs.</p><p><strong>Materials and methods: </strong>From November 2023 to January 2024, 17 one-on-one, semi-structured virtual interviews were conducted with transgender adults residing in the Metropolitan New York area. Transcriptions were analyzed using inductive thematic analysis.</p><p><strong>Results: </strong>Themes clustered around comfort and preferences for data collection processes and outcomes. Factors that influenced preferences and comfort around anatomy data were distinct from those impacting GI documentation preferences and comfort. The tension between the categories of GI and sex assigned at birth impacted anatomy data documentation preferences. Clinical context emerged as a consistent factor that impacts both preferences and comfort of GI and anatomy data documentation.</p><p><strong>Discussion and conclusion: </strong>GI data collection efforts in clinical settings must consider the implication of anatomy data collection when determining data collection best practice methodologies. Anticipated and experienced stigma remain significant hurdles to patient comfort and willingness to collect GI and anatomy data, and their impact on actual data collection should be further elucidated among diverse gender identities. Clinical data collection methods, tools, and education warrant ongoing research investment to further elucidate best practices.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patient attitudes toward ambient artificial intelligence scribes in clinical care: insights from a cross-sectional study. 患者对临床护理中环境人工智能记录仪的态度:来自横断面研究的见解。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-05 DOI: 10.1093/jamia/ocaf218
Ranganathan Chandrasekaran, Evangelos Moustakas

Objective: To assess patient attitudes towards ambient artificial intelligence (AI) scribes, including comfort, trust, perceived impact on provider interactions, and willingness for future use, and to examine how sociodemographic, health factors, digital literacy, and privacy concerns shape attitudes.

Materials and methods: We analyzed cross-sectional data from an online survey of 12 153 adults (52.4% female; 23.1% aged ≥ 65; 41.2% with chronic conditions) in Canada conducted between February 6 and March 10, 2025. Survey-adjusted ordinal and binary logistic regression models assessed predictors, reporting adjusted odds ratios (aORs), 95% confidence intervals (CIs), and P-values.

Results: Most respondents (61.8%) were reluctant to future AI scribe use despite mixed attitudes: 39.3% reported some/very high comfort, 57.4% trusted documentation with human oversight, and 49.5% anticipated positive effects on patient-provider interactions. Awareness of AI scribe use was low (28.3%). Males showed higher odds of favorable comfort (aOR = 1.13, 95% CI, 1.05-1.22, P = .001), trust (aOR = 1.21, 95% CI, 1.10-1.32, P < .001), and future use (aOR = 1.38, 95% CI, 1.27-1.51, P < .001). Chronic conditions showed higher odds of future use (aOR = 1.19, 95% CI, 1.08-1.32, P < .001), whereas poorer general health was associated with lower odds across all outcomes. Fewer emergency room/urgent care visits, lower education, and income levels were associated with less favorable attitudes across outcomes. Higher digital health literacy (aOR = 1.03-1.04, all P < .001) and AI knowledge (aOR = 1.28-1.37, all P < .001) showed associations with higher odds across outcomes; privacy concerns were linked to lower odds (eg, future use: aOR = 0.65, 95% CI, 0.63-0.68, P < .001).

Discussion: Findings reveal a paradox-patients expressed conditional trust and comfort yet remained reluctant to adopt AI scribes, with privacy concerns and low awareness as key barriers.

Conclusion: Targeted interventions addressing digital literacy, privacy safeguards, and clinician-patient communication about AI scribes are needed before widespread adoption.

目的:评估患者对环境人工智能(AI)抄写器的态度,包括舒适度、信任度、对提供者互动的感知影响以及未来使用的意愿,并研究社会人口统计学、健康因素、数字素养和隐私问题如何影响态度。材料和方法:我们分析了2025年2月6日至3月10日在加拿大对12153名成年人(52.4%为女性,23.1%年龄≥65岁,41.2%患有慢性病)进行的在线调查的横断面数据。经调查调整的有序和二元logistic回归模型评估预测因子,报告调整的优势比(aORs)、95%置信区间(ci)和p值。结果:尽管态度不一,但大多数受访者(61.8%)不愿在未来使用人工智能抄写器:39.3%的人表示有一定/非常高的舒适度,57.4%的人信任人工监督的文件,49.5%的人预计对医患互动有积极影响。人工智能书写员使用意识较低(28.3%)。男性表现出较高的良好舒适度(aOR = 1.13, 95% CI, 1.05-1.22, P =。001),信任(aOR = 1.21, 95% CI, 1.10-1.32, P)。讨论:研究结果揭示了一个悖论——患者表达了有条件的信任和安慰,但仍然不愿意采用人工智能抄写器,隐私问题和低意识是主要障碍。结论:在广泛采用人工智能抄写器之前,需要有针对性的干预措施,解决数字素养、隐私保护和临床医患沟通问题。
{"title":"Patient attitudes toward ambient artificial intelligence scribes in clinical care: insights from a cross-sectional study.","authors":"Ranganathan Chandrasekaran, Evangelos Moustakas","doi":"10.1093/jamia/ocaf218","DOIUrl":"https://doi.org/10.1093/jamia/ocaf218","url":null,"abstract":"<p><strong>Objective: </strong>To assess patient attitudes towards ambient artificial intelligence (AI) scribes, including comfort, trust, perceived impact on provider interactions, and willingness for future use, and to examine how sociodemographic, health factors, digital literacy, and privacy concerns shape attitudes.</p><p><strong>Materials and methods: </strong>We analyzed cross-sectional data from an online survey of 12 153 adults (52.4% female; 23.1% aged ≥ 65; 41.2% with chronic conditions) in Canada conducted between February 6 and March 10, 2025. Survey-adjusted ordinal and binary logistic regression models assessed predictors, reporting adjusted odds ratios (aORs), 95% confidence intervals (CIs), and P-values.</p><p><strong>Results: </strong>Most respondents (61.8%) were reluctant to future AI scribe use despite mixed attitudes: 39.3% reported some/very high comfort, 57.4% trusted documentation with human oversight, and 49.5% anticipated positive effects on patient-provider interactions. Awareness of AI scribe use was low (28.3%). Males showed higher odds of favorable comfort (aOR = 1.13, 95% CI, 1.05-1.22, P = .001), trust (aOR = 1.21, 95% CI, 1.10-1.32, P < .001), and future use (aOR = 1.38, 95% CI, 1.27-1.51, P < .001). Chronic conditions showed higher odds of future use (aOR = 1.19, 95% CI, 1.08-1.32, P < .001), whereas poorer general health was associated with lower odds across all outcomes. Fewer emergency room/urgent care visits, lower education, and income levels were associated with less favorable attitudes across outcomes. Higher digital health literacy (aOR = 1.03-1.04, all P < .001) and AI knowledge (aOR = 1.28-1.37, all P < .001) showed associations with higher odds across outcomes; privacy concerns were linked to lower odds (eg, future use: aOR = 0.65, 95% CI, 0.63-0.68, P < .001).</p><p><strong>Discussion: </strong>Findings reveal a paradox-patients expressed conditional trust and comfort yet remained reluctant to adopt AI scribes, with privacy concerns and low awareness as key barriers.</p><p><strong>Conclusion: </strong>Targeted interventions addressing digital literacy, privacy safeguards, and clinician-patient communication about AI scribes are needed before widespread adoption.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Listening to the note: clinician perspectives on ambient artificial intelligence scribes in medical documentation. 听注:临床医生对医疗文件中环境人工智能抄写员的看法。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-03 DOI: 10.1093/jamia/ocaf214
Jen Van Tiem, Elizabeth Cramer, Christopher Iverson, Korey Kennelty, Noah Andrys, Julie Lee, Lindsey Knake, Jason Misurac, James Blum, Heather Schacht Reisinger

Objectives: To qualitatively characterize barriers and facilitators to implementing and using an ambient scribe across a large academic medical center, as well as how ambient transcription reshapes clinicians' perceptions of their work.

Materials and methods: We conducted semistructured interviews with clinicians who participated in an ambient scribe pilot (n = 8) and the initial enterprise rollout (n = 16). We sought heterogeneity by specialty, note volume, burnout, and prior time-in-notes. Interviews (26-60 min) were recorded, transcribed, and analyzed thematically using a naturalistic, ethnographic approach informed by broad implementation considerations, and an analytic lens treating note sections as documentation "genres."

Results: Clinicians described feeling more present with patients and greater satisfaction during visits. Fictions included overlong or underspecified sections (eg, History of Present Illness vs Assessment & Plan), unfamiliar formatting, and a perceived loss of "voice." Participants discussed how they used documentation to personalize practice, demonstrate expertise, manage impressions with colleagues and supervisors, and communicate sensitive findings-activities not fully captured by efficiency metrics. Inpatient and procedure-heavy contexts reported limited benefit where documentation was already highly standardized.

Discussion: Early ambient scribe implementation produced recognizable benefits, but introduced new work to reconcile AI-drafted text with local documentation genres and audience-specific communication. Tailored prompts, onboarding, and peer support may reduce the need to revise artificial intelligence (AI)-generated text.

Conclusion: Ambient scribe adoption can enhance patient interactions and perceived efficiency while reshaping how clinicians express voice and expertise in notes. Implementation strategies attentive to documentation genre and audience may help align ambient scribe outputs with clinical communication needs.

目的:定性地描述在大型学术医疗中心实施和使用环境转录器的障碍和促进因素,以及环境转录如何重塑临床医生对其工作的看法。材料和方法:我们对参与环境记录仪试点(n = 8)和初始企业推广(n = 16)的临床医生进行了半结构化访谈。我们通过专业、音符音量、倦怠和先前的音符时间来寻找异质性。访谈(26-60分钟)被记录、转录并按主题进行分析,使用自然主义的、民族志的方法,根据广泛的实施考虑,并将笔记部分作为文档“类型”进行分析。“结果:临床医生描述了在就诊期间与患者在一起的感觉和更高的满意度。小说包括过长或不明确的章节(例如,病史与评估和计划),不熟悉的格式,以及感知到的“声音”的丧失。与会者讨论了他们如何使用文档来个性化实践,展示专业知识,管理与同事和主管的印象,以及交流敏感的发现——这些活动没有被效率指标完全捕获。在文件已经高度标准化的情况下,住院和程序繁重的环境报告的益处有限。讨论:早期的环境抄写器实现产生了明显的好处,但引入了新的工作,以协调人工智能起草的文本与当地文档类型和受众特定的交流。量身定制的提示、入职和同伴支持可能会减少修改人工智能(AI)生成的文本的需要。结论:采用环境抄写器可以增强患者的互动和感知效率,同时重塑临床医生在笔记中表达声音和专业知识的方式。注重文档类型和受众的实施策略可能有助于使环境抄写员输出与临床交流需求保持一致。
{"title":"Listening to the note: clinician perspectives on ambient artificial intelligence scribes in medical documentation.","authors":"Jen Van Tiem, Elizabeth Cramer, Christopher Iverson, Korey Kennelty, Noah Andrys, Julie Lee, Lindsey Knake, Jason Misurac, James Blum, Heather Schacht Reisinger","doi":"10.1093/jamia/ocaf214","DOIUrl":"https://doi.org/10.1093/jamia/ocaf214","url":null,"abstract":"<p><strong>Objectives: </strong>To qualitatively characterize barriers and facilitators to implementing and using an ambient scribe across a large academic medical center, as well as how ambient transcription reshapes clinicians' perceptions of their work.</p><p><strong>Materials and methods: </strong>We conducted semistructured interviews with clinicians who participated in an ambient scribe pilot (n = 8) and the initial enterprise rollout (n = 16). We sought heterogeneity by specialty, note volume, burnout, and prior time-in-notes. Interviews (26-60 min) were recorded, transcribed, and analyzed thematically using a naturalistic, ethnographic approach informed by broad implementation considerations, and an analytic lens treating note sections as documentation \"genres.\"</p><p><strong>Results: </strong>Clinicians described feeling more present with patients and greater satisfaction during visits. Fictions included overlong or underspecified sections (eg, History of Present Illness vs Assessment & Plan), unfamiliar formatting, and a perceived loss of \"voice.\" Participants discussed how they used documentation to personalize practice, demonstrate expertise, manage impressions with colleagues and supervisors, and communicate sensitive findings-activities not fully captured by efficiency metrics. Inpatient and procedure-heavy contexts reported limited benefit where documentation was already highly standardized.</p><p><strong>Discussion: </strong>Early ambient scribe implementation produced recognizable benefits, but introduced new work to reconcile AI-drafted text with local documentation genres and audience-specific communication. Tailored prompts, onboarding, and peer support may reduce the need to revise artificial intelligence (AI)-generated text.</p><p><strong>Conclusion: </strong>Ambient scribe adoption can enhance patient interactions and perceived efficiency while reshaping how clinicians express voice and expertise in notes. Implementation strategies attentive to documentation genre and audience may help align ambient scribe outputs with clinical communication needs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable confounding adjustment in real-world evidence: benchmarking data-adaptive and investigator-specified strategies in a large-scale trial emulation study. 真实世界证据中的可扩展混杂调整:在大规模试验模拟研究中对数据自适应和研究者指定策略进行基准测试。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-03 DOI: 10.1093/jamia/ocaf204
Andrew R Weckstein, Shirley V Wang, Richard Wyss, Sebastian Schneeweiss

Objectives: Real-world evidence (RWE) increasingly informs clinical decisions, yet manual adjustment for confounding limits scalability. Data-adaptive (DA) algorithms for high-dimensional proxy adjustment show promise but have not been systematically compared to investigator-specified (IS) approaches across diverse treatment scenarios. We evaluated whether DA strategies perform comparably to manually curated IS models using claims-based emulations of 15 randomized trials from the RCT-DUPLICATE initiative.

Materials and methods: We identified new-user cohorts for 15 trial emulations in Optum's de-identified Clinformatics Data Mart Database (2004-2023). Treatment effects were estimated using 3 adjustment strategies: (1) IS models with manually tailored covariates; (2) full-DA strategies using empirical features from semiautomated pipelines; and (3) hybrid-DA models incorporating both empirical and investigator-defined covariates. Agreement with RCT benchmarks was assessed via binary metrics and difference-in-differences.

Results: Outcome-adaptive LASSO achieved better RWE-RCT agreement than IS adjustment in 73% of full-DA and 87% of hybrid-DA emulations. Other DA methods considering feature associations with both treatment and outcome performed similarly well, while models tuned solely for treatment prediction performed poorly. Performance of IS vs DA strategies differed across emulated trials.

Discussion: Top DA algorithms matched manual IS models on average, but impact varied by emulation. Case studies illustrate the continued importance of subject-matter knowledge, particularly for complex treatment strategies.

Conclusion: Data-adaptive algorithms show promise for scalable confounding adjustment in large-scale evidence systems and as augmentation tools for investigator-specified designs. Hybrid strategies combining algorithmic methods with investigator expertise offer the most reliable approach for individual causal questions.

目的:真实世界证据(RWE)越来越多地为临床决策提供信息,但人为调整混杂因素限制了可扩展性。用于高维代理调整的数据自适应(DA)算法显示出前景,但尚未在不同治疗方案中与研究者指定(IS)方法进行系统比较。我们使用来自RCT-DUPLICATE计划的15个随机试验的基于索赔的模拟来评估DA策略是否与人工策划的IS模型表现相当。材料和方法:我们在Optum的去识别临床数据集市数据库(2004-2023)中确定了15个试验模拟的新用户队列。采用3种调整策略估计治疗效果:(1)人工定制协变量的IS模型;(2)基于半自动化管道经验特征的全数据分析策略;(3)结合实证变量和研究者定义协变量的混合数据分析模型。通过二元指标和差中差评估与RCT基准的一致性。结果:结果自适应LASSO在73%的全da和87%的混合da模拟中获得了比IS调整更好的RWE-RCT一致性。其他考虑到与治疗和结果的特征关联的数据分析方法同样表现良好,而仅针对治疗预测进行调整的模型表现不佳。在模拟试验中,IS和DA策略的性能存在差异。讨论:顶级数据处理算法平均匹配手动IS模型,但影响因仿真而异。案例研究说明了主题知识的持续重要性,特别是对于复杂的治疗策略。结论:数据自适应算法有望在大规模证据系统中进行可扩展的混杂调整,并作为调查员指定设计的增强工具。混合策略结合算法方法和调查员的专业知识,为个别因果问题提供了最可靠的方法。
{"title":"Scalable confounding adjustment in real-world evidence: benchmarking data-adaptive and investigator-specified strategies in a large-scale trial emulation study.","authors":"Andrew R Weckstein, Shirley V Wang, Richard Wyss, Sebastian Schneeweiss","doi":"10.1093/jamia/ocaf204","DOIUrl":"https://doi.org/10.1093/jamia/ocaf204","url":null,"abstract":"<p><strong>Objectives: </strong>Real-world evidence (RWE) increasingly informs clinical decisions, yet manual adjustment for confounding limits scalability. Data-adaptive (DA) algorithms for high-dimensional proxy adjustment show promise but have not been systematically compared to investigator-specified (IS) approaches across diverse treatment scenarios. We evaluated whether DA strategies perform comparably to manually curated IS models using claims-based emulations of 15 randomized trials from the RCT-DUPLICATE initiative.</p><p><strong>Materials and methods: </strong>We identified new-user cohorts for 15 trial emulations in Optum's de-identified Clinformatics Data Mart Database (2004-2023). Treatment effects were estimated using 3 adjustment strategies: (1) IS models with manually tailored covariates; (2) full-DA strategies using empirical features from semiautomated pipelines; and (3) hybrid-DA models incorporating both empirical and investigator-defined covariates. Agreement with RCT benchmarks was assessed via binary metrics and difference-in-differences.</p><p><strong>Results: </strong>Outcome-adaptive LASSO achieved better RWE-RCT agreement than IS adjustment in 73% of full-DA and 87% of hybrid-DA emulations. Other DA methods considering feature associations with both treatment and outcome performed similarly well, while models tuned solely for treatment prediction performed poorly. Performance of IS vs DA strategies differed across emulated trials.</p><p><strong>Discussion: </strong>Top DA algorithms matched manual IS models on average, but impact varied by emulation. Case studies illustrate the continued importance of subject-matter knowledge, particularly for complex treatment strategies.</p><p><strong>Conclusion: </strong>Data-adaptive algorithms show promise for scalable confounding adjustment in large-scale evidence systems and as augmentation tools for investigator-specified designs. Hybrid strategies combining algorithmic methods with investigator expertise offer the most reliable approach for individual causal questions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and application of desiderata for automated clinical ordering. 临床自动点单所需数据的开发与应用。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf152
Sameh N Saleh, Kevin B Johnson

Introduction: Automation of clinical orders in electronic health records (EHRs) has the potential to reduce clinician burden and enhance patient safety. However, determining which orders are appropriate for automation requires a structured framework to ensure clinical validity, transparency, and safety.

Objective: To develop and validate a framework of desiderata for assessing the appropriateness of automating clinical orders in EHRs and to demonstrate its operational value in a live health system dataset.

Materials and methods: The study comprised 4 phases to move from concept generation to real-world demonstration. First, we conducted focus group analyses using ground theory to identify themes and developed desiderata informed by these themes and existing literature. We validated the desiderata by surveying clinicians at a single institution, presenting 10 use cases to and assessing perceived appropriateness, cognitive support, and patient safety using a 4-point Likert scale. Survey results were compared to a priori appropriateness designations using t-tests. To evaluate operational impact, we analyzed one year of order-based alerts and orders (1.4 million firings alert and 44.1 million orders, respectively) using filtering rules and association rule mining to identify candidate orders for automation and their impact.

Results: We identified 8 desiderata for automated order appropriateness: logical consistency, data provenance, order transparency, context permanence, monitoring plans, trigger consistency, care team empowerment, and system accountability. Use cases deemed appropriate based on these criteria received significantly higher scores for appropriateness (3.13 ± 0.84 vs 2.30 ± 0.99), cognitive support (3.08 ± 0.82 vs 2.25 ± 0.94), and patient safety (3.08 ± 0.86 vs 2.21 ± 0.98) (all P < .001) compared to those considered inappropriate. Operational analysis revealed an alert firing 19 109 times annually, with a 96% signed order rate, where automation could save an estimated 26.5 provider hours per year. Additionally, an association rule with 16 628 occurrences (68.4% confidence) suggested automation could save 15.8 hours annually and yield 8000 additional appropriate orders.

Discussion: The desiderata align with clinician perceptions and provide a structured approach for evaluating automated orders. Our findings highlight the potential for automation of certain clinical orders to improve cognitive support while maintaining patient safety.

Conclusion: Healthcare systems should use these desiderata, coupled with data mining techniques, to systematically identify and govern appropriate automated orders. Further research is needed to validate operational scalability.

简介:电子健康记录(EHRs)中临床医嘱的自动化有可能减轻临床医生的负担,提高患者的安全。然而,确定哪些订单适合自动化需要一个结构化的框架,以确保临床有效性、透明度和安全性。目的:开发和验证一个理想的框架,用于评估在电子病历中自动化临床医嘱的适当性,并展示其在实时卫生系统数据集中的操作价值。材料和方法:研究分为四个阶段,从概念产生到现实世界的演示。首先,我们使用基础理论进行焦点小组分析,以确定主题,并根据这些主题和现有文献开发出所需的数据。我们通过调查单个机构的临床医生来验证期望,提出10个用例,并使用4点李克特量表评估感知适当性、认知支持和患者安全性。使用t检验将调查结果与先验适当性指定进行比较。为了评估运营影响,我们使用过滤规则和关联规则挖掘分析了一年的基于订单的警报和订单(分别为140万解雇警报和4410万订单),以确定自动化的候选订单及其影响。结果:我们确定了自动化订单适当性的8个需求:逻辑一致性、数据来源、订单透明度、上下文持久性、监控计划、触发一致性、护理团队授权和系统责任。基于这些标准认为合适的用例在适当性(3.13±0.84 vs 2.30±0.99)、认知支持(3.08±0.82 vs 2.25±0.94)和患者安全性(3.08±0.86 vs 2.21±0.98)方面获得了显着更高的分数(所有P)讨论:期望与临床医生的看法一致,并提供了评估自动化订单的结构化方法。我们的研究结果强调了某些临床医嘱自动化的潜力,以提高认知支持,同时维护患者安全。结论:医疗保健系统应该使用这些理想的数据,结合数据挖掘技术,系统地识别和管理适当的自动化订单。需要进一步的研究来验证操作的可扩展性。
{"title":"Development and application of desiderata for automated clinical ordering.","authors":"Sameh N Saleh, Kevin B Johnson","doi":"10.1093/jamia/ocaf152","DOIUrl":"10.1093/jamia/ocaf152","url":null,"abstract":"<p><strong>Introduction: </strong>Automation of clinical orders in electronic health records (EHRs) has the potential to reduce clinician burden and enhance patient safety. However, determining which orders are appropriate for automation requires a structured framework to ensure clinical validity, transparency, and safety.</p><p><strong>Objective: </strong>To develop and validate a framework of desiderata for assessing the appropriateness of automating clinical orders in EHRs and to demonstrate its operational value in a live health system dataset.</p><p><strong>Materials and methods: </strong>The study comprised 4 phases to move from concept generation to real-world demonstration. First, we conducted focus group analyses using ground theory to identify themes and developed desiderata informed by these themes and existing literature. We validated the desiderata by surveying clinicians at a single institution, presenting 10 use cases to and assessing perceived appropriateness, cognitive support, and patient safety using a 4-point Likert scale. Survey results were compared to a priori appropriateness designations using t-tests. To evaluate operational impact, we analyzed one year of order-based alerts and orders (1.4 million firings alert and 44.1 million orders, respectively) using filtering rules and association rule mining to identify candidate orders for automation and their impact.</p><p><strong>Results: </strong>We identified 8 desiderata for automated order appropriateness: logical consistency, data provenance, order transparency, context permanence, monitoring plans, trigger consistency, care team empowerment, and system accountability. Use cases deemed appropriate based on these criteria received significantly higher scores for appropriateness (3.13 ± 0.84 vs 2.30 ± 0.99), cognitive support (3.08 ± 0.82 vs 2.25 ± 0.94), and patient safety (3.08 ± 0.86 vs 2.21 ± 0.98) (all P < .001) compared to those considered inappropriate. Operational analysis revealed an alert firing 19 109 times annually, with a 96% signed order rate, where automation could save an estimated 26.5 provider hours per year. Additionally, an association rule with 16 628 occurrences (68.4% confidence) suggested automation could save 15.8 hours annually and yield 8000 additional appropriate orders.</p><p><strong>Discussion: </strong>The desiderata align with clinician perceptions and provide a structured approach for evaluating automated orders. Our findings highlight the potential for automation of certain clinical orders to improve cognitive support while maintaining patient safety.</p><p><strong>Conclusion: </strong>Healthcare systems should use these desiderata, coupled with data mining techniques, to systematically identify and govern appropriate automated orders. Further research is needed to validate operational scalability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1899-1907"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frameworks and methods. 框架和方法。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf191
Suzanne Bakken
{"title":"Frameworks and methods.","authors":"Suzanne Bakken","doi":"10.1093/jamia/ocaf191","DOIUrl":"10.1093/jamia/ocaf191","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"32 12","pages":"1791-1792"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646372/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hillclimb-Causal Inference: a data-driven approach to identify causal pathways among parental behaviors, genetic risk, and externalizing behaviors in children. 爬山-因果推理:一种数据驱动的方法来识别父母行为、遗传风险和儿童外化行为之间的因果途径。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf153
Mengman Wei, Qian Peng

Objectives: Externalizing behaviors in children, such as aggression, hyperactivity, and defiance, are influenced by complex interplays between genetic predispositions and environmental factors, particularly parental behaviors. Unraveling these intricate causal relationships can benefit from the use of robust data-driven methods.

Materials and methods: We developed "Hillclimb-Causal Inference," a causal discovery approach that integrates the Hill Climb Search algorithm with a customized Linear Gaussian Bayesian Information Criterion (BIC). This method was applied to data from the Adolescent Brain Cognitive Development (ABCD) Study, which included parental behavior assessments, children's genotypes, and externalizing behavior measures. We performed dimensionality reduction to address multicollinearity among parental behaviors and assessed children's genetic risk for externalizing disorders using polygenic risk scores (PRS), which were computed based on GWAS summary statistics from independent cohorts. Once the causal pathways were identified, we employed structural equation modeling (SEM) to quantify the relationships within the model.

Results: We identified prominent causal pathways linking parental behaviors to children's externalizing outcomes. Parental alcohol misuse and broader behavioral issues exhibited notably stronger direct effects (0.33 and 0.20, respectively) compared to children's PRS (0.07). Moreover, when considering both direct and indirect paths, parental substance misuse (alcohol, drugs, and tobacco) collectively resulted in a total effect exceeding 1.1 on externalizing behaviors. Bootstrap and sensitivity analyses further validated the robustness of these findings.

Discussion and conclusion: Parental behaviors exert larger effects on children's externalizing outcomes than genetic risk, suggesting potential targets for prevention and intervention. The Hillclimb-Causal framework provides a general, data-driven way to map causal pathways in developmental psychiatry and related domains.

目的:儿童的外化行为,如攻击、多动和反抗,受到遗传倾向和环境因素(尤其是父母行为)之间复杂的相互作用的影响。揭示这些复杂的因果关系可以从使用健壮的数据驱动方法中受益。材料和方法:我们开发了“爬山-因果推理”,这是一种因果发现方法,将爬山搜索算法与定制的线性高斯贝叶斯信息准则(BIC)集成在一起。该方法应用于青少年大脑认知发展(ABCD)研究的数据,包括父母行为评估、儿童基因型和外化行为测量。我们进行了降维,以解决父母行为之间的多重共线性,并使用多基因风险评分(PRS)评估儿童外部性疾病的遗传风险,PRS是基于独立队列的GWAS汇总统计数据计算的。一旦确定了因果关系,我们采用结构方程模型(SEM)来量化模型内的关系。结果:我们确定了将父母行为与儿童外化结果联系起来的重要因果途径。与儿童的PRS(0.07)相比,父母酒精滥用和更广泛的行为问题表现出明显更强的直接影响(分别为0.33和0.20)。此外,当考虑直接和间接途径时,父母物质滥用(酒精,毒品和烟草)共同导致外化行为的总效应超过1.1。Bootstrap和敏感性分析进一步验证了这些发现的稳健性。讨论与结论:父母行为对儿童外化结局的影响大于遗传风险,提示了预防和干预的潜在目标。Hillclimb-Causal框架提供了一种通用的、数据驱动的方法来绘制发展精神病学和相关领域的因果路径。
{"title":"Hillclimb-Causal Inference: a data-driven approach to identify causal pathways among parental behaviors, genetic risk, and externalizing behaviors in children.","authors":"Mengman Wei, Qian Peng","doi":"10.1093/jamia/ocaf153","DOIUrl":"10.1093/jamia/ocaf153","url":null,"abstract":"<p><strong>Objectives: </strong>Externalizing behaviors in children, such as aggression, hyperactivity, and defiance, are influenced by complex interplays between genetic predispositions and environmental factors, particularly parental behaviors. Unraveling these intricate causal relationships can benefit from the use of robust data-driven methods.</p><p><strong>Materials and methods: </strong>We developed \"Hillclimb-Causal Inference,\" a causal discovery approach that integrates the Hill Climb Search algorithm with a customized Linear Gaussian Bayesian Information Criterion (BIC). This method was applied to data from the Adolescent Brain Cognitive Development (ABCD) Study, which included parental behavior assessments, children's genotypes, and externalizing behavior measures. We performed dimensionality reduction to address multicollinearity among parental behaviors and assessed children's genetic risk for externalizing disorders using polygenic risk scores (PRS), which were computed based on GWAS summary statistics from independent cohorts. Once the causal pathways were identified, we employed structural equation modeling (SEM) to quantify the relationships within the model.</p><p><strong>Results: </strong>We identified prominent causal pathways linking parental behaviors to children's externalizing outcomes. Parental alcohol misuse and broader behavioral issues exhibited notably stronger direct effects (0.33 and 0.20, respectively) compared to children's PRS (0.07). Moreover, when considering both direct and indirect paths, parental substance misuse (alcohol, drugs, and tobacco) collectively resulted in a total effect exceeding 1.1 on externalizing behaviors. Bootstrap and sensitivity analyses further validated the robustness of these findings.</p><p><strong>Discussion and conclusion: </strong>Parental behaviors exert larger effects on children's externalizing outcomes than genetic risk, suggesting potential targets for prevention and intervention. The Hillclimb-Causal framework provides a general, data-driven way to map causal pathways in developmental psychiatry and related domains.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1936-1946"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-shot medical event prediction using a generative pretrained transformer on electronic health records. 基于电子健康记录的生成式预训练变压器的零射击医疗事件预测。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf160
Ekaterina Redekop, Zichen Wang, Rushikesh Kulkarni, Mara Pleasure, Aaron Chin, Hamid Reza Hassanzadeh, Brian L Hill, Melika Emami, William F Speier, Corey W Arnold

Objectives: Longitudinal data in electronic health records (EHRs) represent an individual's clinical history through a sequence of codified concepts, including diagnoses, procedures, medications, and laboratory tests. Generative pretrained transformers (GPT) can leverage this data to predict future events. While fine-tuning of these models can enhance task-specific performance, it becomes costly when applied to many clinical prediction tasks. In contrast, a pretrained foundation model can be used in zero-shot forecasting setting, offering a scalable alternative to fine-tuning separate models for each outcome.

Materials and methods: This study presents the first comprehensive analysis of zero-shot forecasting with GPT-based foundational models in EHRs, introducing a novel pipeline that formulates medical concept prediction as a generative modeling task. Unlike supervised approaches requiring extensive labeled data, our method enables the model to forecast the next medical event purely from a pretraining knowledge. We evaluate performance across multiple time horizons and clinical categories, demonstrating model's ability to capture latent temporal dependencies and complex patient trajectories without task supervision.

Results: The model's performance in predicting the next medical concept was evaluated using precision and recall metrics, achieving an average top-1 precision of 0.614 and recall of 0.524. For 12 major diagnostic conditions, the model demonstrated strong zero-shot performance, achieving high true positive rates while maintaining low false positives.

Discussion: We demonstrate the power of a foundational EHR GPT model in capturing diverse phenotypes and enabling robust, zero-shot forecasting of clinical outcomes. This capability highlights both its versatility across conditions like liver cancer and SLE, and its limitations in more ambiguous settings such as depression, while also revealing meaningful latent clinical structure.

Conclusion: This capability enhances the versatility of predictive healthcare models and reduces the need for task-specific training, enabling more scalable applications in clinical settings.

目的:电子健康记录(EHRs)中的纵向数据通过一系列编纂的概念代表个人的临床病史,包括诊断、程序、药物和实验室测试。生成式预训练变压器(GPT)可以利用这些数据来预测未来的事件。虽然这些模型的微调可以提高特定任务的性能,但当应用于许多临床预测任务时,它变得昂贵。相比之下,预训练的基础模型可用于零概率预测设置,为每个结果提供可扩展的替代方案,以微调单独的模型。材料和方法:本研究首次全面分析了电子病历中基于gpt的基础模型的零概率预测,引入了一种将医学概念预测作为生成建模任务的新管道。与需要大量标记数据的监督方法不同,我们的方法使模型能够纯粹从预训练知识中预测下一个医疗事件。我们评估了多个时间范围和临床类别的性能,展示了模型在没有任务监督的情况下捕获潜在时间依赖性和复杂患者轨迹的能力。结果:该模型在预测下一个医学概念方面的性能使用精度和召回率指标进行评估,平均前1精度为0.614,召回率为0.524。对于12种主要诊断条件,该模型表现出强大的零射击性能,在保持低假阳性的同时实现高真阳性率。讨论:我们展示了基础EHR GPT模型在捕获不同表型和实现临床结果的稳健、零概率预测方面的能力。这种能力突出了它在肝癌和SLE等疾病中的通用性,以及在抑郁症等更模糊的情况下的局限性,同时也揭示了有意义的潜在临床结构。结论:该功能增强了预测性医疗模型的多功能性,减少了对特定于任务的培训的需求,从而在临床环境中实现了更具可扩展性的应用程序。
{"title":"Zero-shot medical event prediction using a generative pretrained transformer on electronic health records.","authors":"Ekaterina Redekop, Zichen Wang, Rushikesh Kulkarni, Mara Pleasure, Aaron Chin, Hamid Reza Hassanzadeh, Brian L Hill, Melika Emami, William F Speier, Corey W Arnold","doi":"10.1093/jamia/ocaf160","DOIUrl":"10.1093/jamia/ocaf160","url":null,"abstract":"<p><strong>Objectives: </strong>Longitudinal data in electronic health records (EHRs) represent an individual's clinical history through a sequence of codified concepts, including diagnoses, procedures, medications, and laboratory tests. Generative pretrained transformers (GPT) can leverage this data to predict future events. While fine-tuning of these models can enhance task-specific performance, it becomes costly when applied to many clinical prediction tasks. In contrast, a pretrained foundation model can be used in zero-shot forecasting setting, offering a scalable alternative to fine-tuning separate models for each outcome.</p><p><strong>Materials and methods: </strong>This study presents the first comprehensive analysis of zero-shot forecasting with GPT-based foundational models in EHRs, introducing a novel pipeline that formulates medical concept prediction as a generative modeling task. Unlike supervised approaches requiring extensive labeled data, our method enables the model to forecast the next medical event purely from a pretraining knowledge. We evaluate performance across multiple time horizons and clinical categories, demonstrating model's ability to capture latent temporal dependencies and complex patient trajectories without task supervision.</p><p><strong>Results: </strong>The model's performance in predicting the next medical concept was evaluated using precision and recall metrics, achieving an average top-1 precision of 0.614 and recall of 0.524. For 12 major diagnostic conditions, the model demonstrated strong zero-shot performance, achieving high true positive rates while maintaining low false positives.</p><p><strong>Discussion: </strong>We demonstrate the power of a foundational EHR GPT model in capturing diverse phenotypes and enabling robust, zero-shot forecasting of clinical outcomes. This capability highlights both its versatility across conditions like liver cancer and SLE, and its limitations in more ambiguous settings such as depression, while also revealing meaningful latent clinical structure.</p><p><strong>Conclusion: </strong>This capability enhances the versatility of predictive healthcare models and reduces the need for task-specific training, enabling more scalable applications in clinical settings.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1833-1842"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A SNAPpy use of large language models: using large language models to classify treatment plans in pediatric acute otitis media. 快速使用大型语言模型:使用大型语言模型对儿童急性中耳炎的治疗方案进行分类。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-01 DOI: 10.1093/jamia/ocaf170
Jessica J Pourian, Ben Michaels, Anh Vo, A Jay Holmgren, Augusto Garcia-Agundez, Valerie Flaherman

Background and significance: Acute otitis media (AOM) is a leading cause of pediatric antibiotic overuse. Safety Net Antibiotic Prescriptions (SNAPs) are recommended for antibiotic stewardship but are difficult to identify due to lack of structured documentation.

Objective: This study validates the accuracy of Versa, a GPT-4o based HIPAA-compliant large language model (LLM), to classify AOM treatment plans from physician notes.

Methods: A retrospective cross-sectional study analyzed pediatric AOM encounters. Multiple prompting strategies were used to classify treatment plans and validated against a representative sample of manual reviews by 2 pediatricians. A locally fine-tuned model, Clinical-Longformer was also trained and tested against Versa and human review.

Results: In total, 5707 encounters were included; 374 reviewed manually. Zero-shot accuracy was 97.8%; few-shot accuracy was 85%. Clinical-Longformer achieved 93.3% accuracy.

Conclusion: Versa effectively identifies AOM treatment plans, providing a cost-efficient quality improvement tracking tool for prescription practice patterns in pediatric antibiotic stewardship efforts.

背景与意义:急性中耳炎(AOM)是儿童抗生素过度使用的主要原因。安全网抗生素处方(SNAPs)被推荐用于抗生素管理,但由于缺乏结构化文件而难以识别。目的:验证基于gpt - 40的符合hipaa的大语言模型Versa (LLM)对AOM治疗方案进行分类的准确性。方法:回顾性横断面研究分析儿科急性中耳炎。使用多种提示策略对治疗方案进行分类,并对2名儿科医生的人工评价的代表性样本进行验证。clini - longformer是一种局部微调模型,也接受了Versa和人体检查的训练和测试。结果:共纳入5707次就诊;374条手工审阅。零射击精度97.8%;少发精度为85%。Clinical-Longformer准确率达到93.3%。结论:Versa可有效识别AOM治疗方案,为儿科抗生素管理工作中的处方实践模式提供具有成本效益的质量改进跟踪工具。
{"title":"A SNAPpy use of large language models: using large language models to classify treatment plans in pediatric acute otitis media.","authors":"Jessica J Pourian, Ben Michaels, Anh Vo, A Jay Holmgren, Augusto Garcia-Agundez, Valerie Flaherman","doi":"10.1093/jamia/ocaf170","DOIUrl":"10.1093/jamia/ocaf170","url":null,"abstract":"<p><strong>Background and significance: </strong>Acute otitis media (AOM) is a leading cause of pediatric antibiotic overuse. Safety Net Antibiotic Prescriptions (SNAPs) are recommended for antibiotic stewardship but are difficult to identify due to lack of structured documentation.</p><p><strong>Objective: </strong>This study validates the accuracy of Versa, a GPT-4o based HIPAA-compliant large language model (LLM), to classify AOM treatment plans from physician notes.</p><p><strong>Methods: </strong>A retrospective cross-sectional study analyzed pediatric AOM encounters. Multiple prompting strategies were used to classify treatment plans and validated against a representative sample of manual reviews by 2 pediatricians. A locally fine-tuned model, Clinical-Longformer was also trained and tested against Versa and human review.</p><p><strong>Results: </strong>In total, 5707 encounters were included; 374 reviewed manually. Zero-shot accuracy was 97.8%; few-shot accuracy was 85%. Clinical-Longformer achieved 93.3% accuracy.</p><p><strong>Conclusion: </strong>Versa effectively identifies AOM treatment plans, providing a cost-efficient quality improvement tracking tool for prescription practice patterns in pediatric antibiotic stewardship efforts.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1947-1951"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646383/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1