首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Factors influencing the effectiveness of artificial intelligence-assisted decision-making in medicine: a scoping review. 影响医学中人工智能辅助决策有效性的因素:范围综述。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-28 DOI: 10.1093/jamia/ocag002
Nicholas J Jackson, Katherine E Brown, Rachael Miller, Matthew Murrow, Michael R Cauley, Benjamin X Collins, Laurie L Novak, Natalie C Benda, Jessica S Ancker

Objectives: Research on artificial intelligence (AI)-based clinical decision-support (AI-CDS) systems has returned mixed results. Sometimes providing AI-CDS to a clinician will improve decision-making performance, sometimes it will not, and it is not always clear why. This scoping review seeks to clarify existing evidence by identifying clinician-level and technology design factors that impact the effectiveness of AI-assisted decision-making in medicine.

Materials and methods: We searched MEDLINE, Web of Science, and Embase for peer-reviewed papers that studied factors impacting the effectiveness of AI-CDS. We identified the factors studied and their impact on 3 outcomes: clinicians' attitudes toward AI, their decisions (eg, acceptance rate of AI recommendations), and their performance when utilizing AI-CDS.

Results: We retrieved 5850 articles and included 45. Four clinician-level and technology design factors were commonly studied. Expert clinicians may benefit less from AI-CDS than nonexperts, with some mixed results. Explainable AI increased clinicians' trust, but could also increase trust in incorrect AI recommendations, potentially harming human-AI collaborative performance. Clinicians' baseline attitudes toward AI predict their acceptance rates of AI recommendations. Of the 3 outcomes of interest, human-AI collaborative performance was most commonly assessed.

Discussion and conclusion: Few factors have been studied for their impact on the effectiveness of AI-CDS. Due to conflicting outcomes between studies, we recommend future work should leverage the concept of "appropriate trust" to facilitate more robust research on AI-CDS, aiming not to increase overall trust in or acceptance of AI but to ensure that clinicians accept AI recommendations only when trust in AI is warranted.

目的:基于人工智能(AI)的临床决策支持(AI- cds)系统的研究结果好坏参半。有时向临床医生提供AI-CDS会改善决策表现,有时不会,原因并不总是很清楚。本综述旨在通过确定影响人工智能辅助医学决策有效性的临床水平和技术设计因素来澄清现有证据。材料和方法:我们在MEDLINE、Web of Science和Embase上检索了研究影响AI-CDS有效性因素的同行评议论文。我们确定了研究的因素及其对3个结果的影响:临床医生对人工智能的态度,他们的决定(例如,人工智能建议的接受率),以及他们在使用人工智能cd时的表现。结果:共检索到5850篇文献,纳入45篇。通常研究四个临床水平和技术设计因素。与非专家相比,专家临床医生从AI-CDS中获益较少,结果好坏参半。可解释的人工智能增加了临床医生的信任,但也可能增加对不正确的人工智能建议的信任,从而潜在地损害人类与人工智能的协作绩效。临床医生对人工智能的基本态度预测了他们对人工智能建议的接受率。在我们感兴趣的3个结果中,人类与人工智能的协作性能是最常被评估的。讨论与结论:对AI-CDS有效性影响因素的研究较少。由于研究之间的结果相互矛盾,我们建议未来的工作应该利用“适当信任”的概念来促进对AI- cds的更强有力的研究,其目的不是增加对AI的整体信任或接受度,而是确保临床医生只有在对AI的信任得到保证时才接受AI建议。
{"title":"Factors influencing the effectiveness of artificial intelligence-assisted decision-making in medicine: a scoping review.","authors":"Nicholas J Jackson, Katherine E Brown, Rachael Miller, Matthew Murrow, Michael R Cauley, Benjamin X Collins, Laurie L Novak, Natalie C Benda, Jessica S Ancker","doi":"10.1093/jamia/ocag002","DOIUrl":"10.1093/jamia/ocag002","url":null,"abstract":"<p><strong>Objectives: </strong>Research on artificial intelligence (AI)-based clinical decision-support (AI-CDS) systems has returned mixed results. Sometimes providing AI-CDS to a clinician will improve decision-making performance, sometimes it will not, and it is not always clear why. This scoping review seeks to clarify existing evidence by identifying clinician-level and technology design factors that impact the effectiveness of AI-assisted decision-making in medicine.</p><p><strong>Materials and methods: </strong>We searched MEDLINE, Web of Science, and Embase for peer-reviewed papers that studied factors impacting the effectiveness of AI-CDS. We identified the factors studied and their impact on 3 outcomes: clinicians' attitudes toward AI, their decisions (eg, acceptance rate of AI recommendations), and their performance when utilizing AI-CDS.</p><p><strong>Results: </strong>We retrieved 5850 articles and included 45. Four clinician-level and technology design factors were commonly studied. Expert clinicians may benefit less from AI-CDS than nonexperts, with some mixed results. Explainable AI increased clinicians' trust, but could also increase trust in incorrect AI recommendations, potentially harming human-AI collaborative performance. Clinicians' baseline attitudes toward AI predict their acceptance rates of AI recommendations. Of the 3 outcomes of interest, human-AI collaborative performance was most commonly assessed.</p><p><strong>Discussion and conclusion: </strong>Few factors have been studied for their impact on the effectiveness of AI-CDS. Due to conflicting outcomes between studies, we recommend future work should leverage the concept of \"appropriate trust\" to facilitate more robust research on AI-CDS, aiming not to increase overall trust in or acceptance of AI but to ensure that clinicians accept AI recommendations only when trust in AI is warranted.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146259741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Translating evidence into practice: adapting TrialGPT for real-world clinical trial eligibility screening. 将证据转化为实践:将TrialGPT应用于真实世界的临床试验资格筛选。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-27 DOI: 10.1093/jamia/ocag006
Mahanazuddin Syed, Muayad Hamidi, Manju Bikkanuri, Nicole Adele Dierschke, Haritha Vardhini Katragadda, Meredith Zozus, Antonio Lucio Teixeira

Objectives: To evaluate the performance of a locally deployed adaptation of TrialGPT, a large language model (LLM) system for identifying trial-eligible patients from unstructured electronic health record (EHR) data.

Materials and methods: TrialGPT was re-engineered for secure, deployment at UT Health San Antonio using a locally hosted LLM. It was optimized for real-world data needs through a longitudinal patient-encounter-note hierarchy mirroring EHR documentation. Performance was evaluated in two stages: (1) benchmarking against an expert-adjudicated gold corpus (n = 149) and (2) comparative validation against manual screening (n = 55).

Results: Against the expert-adjudicated corpus, the system achieved 81.8% sensitivity, 97.8% specificity, and a positive predictive value of 75.0%. Compared with manual screening, it identified more than twice as many truly eligible patients (81.8% vs 36.4%) while preserving equivalent specificity.

Conclusion: The adapted TrialGPT framework operationalizes trial matching, translating EHR data into actionable screening intelligence for efficient, scalable clinical trial recruitment.

目的:评估本地部署的TrialGPT适应性的性能,TrialGPT是一种大型语言模型(LLM)系统,用于从非结构化电子健康记录(EHR)数据中识别符合试验条件的患者。材料和方法:TrialGPT经过重新设计,在UT Health San Antonio使用本地托管的LLM进行安全部署。它通过纵向的病人-遇到-笔记层次结构镜像EHR文档,针对现实世界的数据需求进行了优化。性能评估分为两个阶段:(1)针对专家评审的黄金语料库(n = 149)和(2)针对人工筛选的比较验证(n = 55)进行基准测试。结果:针对专家判定的语料库,该系统的敏感性为81.8%,特异性为97.8%,阳性预测值为75.0%。与人工筛查相比,它识别出的真正符合条件的患者数量是人工筛查的两倍多(81.8%对36.4%),同时保留了相同的特异性。结论:经过调整的TrialGPT框架可实现试验匹配,将电子病历数据转化为可操作的筛选情报,以实现高效、可扩展的临床试验招募。
{"title":"Translating evidence into practice: adapting TrialGPT for real-world clinical trial eligibility screening.","authors":"Mahanazuddin Syed, Muayad Hamidi, Manju Bikkanuri, Nicole Adele Dierschke, Haritha Vardhini Katragadda, Meredith Zozus, Antonio Lucio Teixeira","doi":"10.1093/jamia/ocag006","DOIUrl":"10.1093/jamia/ocag006","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the performance of a locally deployed adaptation of TrialGPT, a large language model (LLM) system for identifying trial-eligible patients from unstructured electronic health record (EHR) data.</p><p><strong>Materials and methods: </strong>TrialGPT was re-engineered for secure, deployment at UT Health San Antonio using a locally hosted LLM. It was optimized for real-world data needs through a longitudinal patient-encounter-note hierarchy mirroring EHR documentation. Performance was evaluated in two stages: (1) benchmarking against an expert-adjudicated gold corpus (n = 149) and (2) comparative validation against manual screening (n = 55).</p><p><strong>Results: </strong>Against the expert-adjudicated corpus, the system achieved 81.8% sensitivity, 97.8% specificity, and a positive predictive value of 75.0%. Compared with manual screening, it identified more than twice as many truly eligible patients (81.8% vs 36.4%) while preserving equivalent specificity.</p><p><strong>Conclusion: </strong>The adapted TrialGPT framework operationalizes trial matching, translating EHR data into actionable screening intelligence for efficient, scalable clinical trial recruitment.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NutriRAG: unleashing the power of large language models for food identification and classification through retrieval methods. NutriRAG:通过检索方法释放大型语言模型的力量,用于食品识别和分类。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-23 DOI: 10.1093/jamia/ocag003
Huixue Zhou, Lisa Chow, Lisa Harnack, Satchidananda Panda, Emily N C Manoogian, Mingchen Li, Yongkang Xiao, Rui Zhang

Objectives: This study explores the use of advanced natural language processing (NLP) techniques to enhance food classification and dietary analysis using raw text input from a diet tracking app.

Materials and methods: The study was conducted in 3 stages: data collection, framework development, and application. Data were collected from a 12-week randomized controlled trial (RCT: NCT04259632), in which participants recorded their meals in free-text format using the myCircadianClock app. Only de-identified data were used. We developed nutrition-focused retrieval-augmented generation (NutriRAG), an NLP framework that uses a retrieval-augmented generation approach to enhance food classification from free-text inputs. The framework retrieves relevant examples from a curated database and then leverages large language models, such as GPT-4, to classify user-recorded food items into predefined categories without fine-tuning. NutriRAG was then applied to data from the RCT, which included 77 adults with obesity recruited from the Twin Cities metro area and randomized into 3 intervention groups: time-restricted eating (TRE, 8-hs eating window), caloric restriction (CR, 15% reduction), and unrestricted eating.

Results: NutriRAG significantly enhanced classification accuracy and helped to analyze dietary habits, as noted by the retrieval-augmented GPT-4 model achieving a micro-F1 score of 82.24. Both interventions showed dietary alterations: CR participants ate fewer snacks and sugary foods, while TRE participants reduced nighttime eating.

Conclusion: By using artificial intelligence, NutriRAG marks a substantial advancement in food classification and dietary analysis of nutritional assessments. The findings highlight NLP's potential to personalize nutrition and manage diet-related health issues, suggesting further research to expand these models for wider use.

目的:本研究探索使用先进的自然语言处理(NLP)技术,通过饮食跟踪应用程序的原始文本输入来增强食物分类和饮食分析。材料和方法:本研究分三个阶段进行:数据收集、框架开发和应用。数据来自一项为期12周的随机对照试验(RCT: NCT04259632),参与者使用myCircadianClock应用程序以自由文本格式记录他们的饮食。仅使用未识别的数据。我们开发了以营养为中心的检索增强生成(NutriRAG),这是一个NLP框架,它使用检索增强生成方法来增强自由文本输入的食物分类。该框架从一个精心设计的数据库中检索相关示例,然后利用大型语言模型(如GPT-4)将用户记录的食物分类到预定义的类别中,而无需进行微调。然后将NutriRAG应用于RCT的数据,其中包括从双城都市区招募的77名肥胖成年人,并随机分为3个干预组:限时饮食(TRE, 8小时进食窗口),热量限制(CR,减少15%)和无限制饮食。结果:NutriRAG显著提高了分类准确率,有助于分析饮食习惯,检索增强GPT-4模型的micro-F1得分为82.24。两项干预都显示了饮食的改变:CR参与者少吃零食和含糖食物,而TRE参与者减少了夜间进食。结论:NutriRAG利用人工智能技术,在食品分类和膳食分析营养评价方面取得了实质性进展。研究结果强调了NLP在个性化营养和管理饮食相关健康问题方面的潜力,建议进一步研究以扩大这些模型的广泛应用。
{"title":"NutriRAG: unleashing the power of large language models for food identification and classification through retrieval methods.","authors":"Huixue Zhou, Lisa Chow, Lisa Harnack, Satchidananda Panda, Emily N C Manoogian, Mingchen Li, Yongkang Xiao, Rui Zhang","doi":"10.1093/jamia/ocag003","DOIUrl":"10.1093/jamia/ocag003","url":null,"abstract":"<p><strong>Objectives: </strong>This study explores the use of advanced natural language processing (NLP) techniques to enhance food classification and dietary analysis using raw text input from a diet tracking app.</p><p><strong>Materials and methods: </strong>The study was conducted in 3 stages: data collection, framework development, and application. Data were collected from a 12-week randomized controlled trial (RCT: NCT04259632), in which participants recorded their meals in free-text format using the myCircadianClock app. Only de-identified data were used. We developed nutrition-focused retrieval-augmented generation (NutriRAG), an NLP framework that uses a retrieval-augmented generation approach to enhance food classification from free-text inputs. The framework retrieves relevant examples from a curated database and then leverages large language models, such as GPT-4, to classify user-recorded food items into predefined categories without fine-tuning. NutriRAG was then applied to data from the RCT, which included 77 adults with obesity recruited from the Twin Cities metro area and randomized into 3 intervention groups: time-restricted eating (TRE, 8-hs eating window), caloric restriction (CR, 15% reduction), and unrestricted eating.</p><p><strong>Results: </strong>NutriRAG significantly enhanced classification accuracy and helped to analyze dietary habits, as noted by the retrieval-augmented GPT-4 model achieving a micro-F1 score of 82.24. Both interventions showed dietary alterations: CR participants ate fewer snacks and sugary foods, while TRE participants reduced nighttime eating.</p><p><strong>Conclusion: </strong>By using artificial intelligence, NutriRAG marks a substantial advancement in food classification and dietary analysis of nutritional assessments. The findings highlight NLP's potential to personalize nutrition and manage diet-related health issues, suggesting further research to expand these models for wider use.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13005737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of artificial intelligence scribes on physician and advanced practice provider cognitive load and well-being. 人工智能对医生和高级实践提供者认知负荷和健康的影响。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-21 DOI: 10.1093/jamia/ocag005
Kathryn R Schneider, Hillary E Swann-Thomsen, Terry G Ribbens, Lucas A Bahnmaier, Trevor Satterfield, Reme Pullicar, Neeraj Soni

Background and significance: Physician and advanced practice provider (APP) well-being is a critical focus in healthcare. Emerging technology such as generative artificial intelligence (GAI) scribes reduces physician and APP administrative burden created by electronic health records. Early adopters of this technology have demonstrated promising improvements in clinical documentation, well-being, and cognitive load. However, further exploration across professional roles is warranted.

Objective: The goal of this quality improvement initiative was to explore how GAI scribes impacted well-being, cognitive load, and practice efficiency among physicians and APPs across professional roles.

Methods: A cross-sectional anonymous survey was conducted prior to implementation of GAI scribe technology and 3 months after physicians and APPs were onboarded.

Results: Physicians and APPs showed a reduction in cognitive task load following scribe technology implementation. Physicians reported reduced burnout and intent to leave; however, APPs did not have a significant reduction in burnout or intent to leave.

Conclusion: Artificial intelligence scribe technology shows potential for improving well-being among physicians and APPs by reducing cognitive load and clinical documentation time. Although some differences were found, overall, the technology appears to hold promise across professional roles.

背景和意义:医生和高级实践提供者(APP)的福祉是医疗保健的关键焦点。新兴技术,如生成式人工智能(GAI)抄写器,减轻了电子健康记录给医生和APP带来的管理负担。这项技术的早期采用者在临床记录、健康和认知负荷方面表现出了有希望的改善。然而,跨专业角色的进一步探索是必要的。目的:本质量改进计划的目标是探讨GAI抄写员如何影响医生和app跨专业角色的幸福感、认知负荷和实践效率。方法:在GAI抄写技术实施前和医生和app入职后3个月进行横断面匿名调查。结果:采用抄写技术后,医生和app的认知任务负荷有所降低。医生报告说,他们的倦怠和离职意愿减少了;然而,应用程序并没有显著减少倦怠或离职的意图。结论:人工智能抄写技术通过减少认知负荷和临床记录时间,显示出改善医生和app幸福感的潜力。尽管发现了一些差异,但总的来说,这项技术似乎在不同的职业角色中都有前景。
{"title":"The impact of artificial intelligence scribes on physician and advanced practice provider cognitive load and well-being.","authors":"Kathryn R Schneider, Hillary E Swann-Thomsen, Terry G Ribbens, Lucas A Bahnmaier, Trevor Satterfield, Reme Pullicar, Neeraj Soni","doi":"10.1093/jamia/ocag005","DOIUrl":"https://doi.org/10.1093/jamia/ocag005","url":null,"abstract":"<p><strong>Background and significance: </strong>Physician and advanced practice provider (APP) well-being is a critical focus in healthcare. Emerging technology such as generative artificial intelligence (GAI) scribes reduces physician and APP administrative burden created by electronic health records. Early adopters of this technology have demonstrated promising improvements in clinical documentation, well-being, and cognitive load. However, further exploration across professional roles is warranted.</p><p><strong>Objective: </strong>The goal of this quality improvement initiative was to explore how GAI scribes impacted well-being, cognitive load, and practice efficiency among physicians and APPs across professional roles.</p><p><strong>Methods: </strong>A cross-sectional anonymous survey was conducted prior to implementation of GAI scribe technology and 3 months after physicians and APPs were onboarded.</p><p><strong>Results: </strong>Physicians and APPs showed a reduction in cognitive task load following scribe technology implementation. Physicians reported reduced burnout and intent to leave; however, APPs did not have a significant reduction in burnout or intent to leave.</p><p><strong>Conclusion: </strong>Artificial intelligence scribe technology shows potential for improving well-being among physicians and APPs by reducing cognitive load and clinical documentation time. Although some differences were found, overall, the technology appears to hold promise across professional roles.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146202888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research. 从用例到基础设施:数据驱动的生物医学研究优先事项的跨机构调查。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-20 DOI: 10.1093/jamia/ocag001
Raja Mazumder, Jonathon Keeney, Luke Johnson, Lori Krammer, Patrick McNeely, Jorge Sepulveda, Danielle Hangen, Maria Martin, Dushyanth Jyothi, Jonas De Almeida, Peter McGarvey, Adil Alaoui, Sarah Cha, Art Sedrakyan, Evan Shoelle, Michael Matheny, Michele LeNoue-Newton, Robert Winter, Stephen Deppen, Vahan Simonyan, Anelia Horvath

Objectives: Federated Ecosystems for Analytics and Standardized Technologies (FEAST) is a modular, cloud-based platform developed through the ARPA-H Biomedical Data Fabric initiative to enable secure, federated analysis of real-world biomedical data. To guide and iteratively refine its modular design, the FEAST team conducted a cross-institutional survey to systematically identify and prioritize research needs related to authorized-access data across diverse biomedical domains. This study presents a structured synthesis of submitted use cases to uncover infrastructure gaps, data integration challenges, and translational opportunities. The results from the survey inform both front-end user-facing functionality and backend data requirements, shaping how the interface supports user interactions, data types, and compliance with security and interoperability standards.

Materials and methods: A structured survey form was distributed to researchers affiliated with participating institutions, including DNA-HIVE, The George Washington University (GW-FEAST), Weill Cornell Medicine, Vanderbilt University Medical Center, Georgetown University, European Bioinformatics Institute, and Kaiser Permanente. Respondents completed standardized fields describing the data types of interest, project goals, analytic methods, and perceived technical barriers. The collected responses were curated and analyzed to identify common needs related to privacy, interoperability, scalability, and workflow reproducibility.

Results: The survey compiled 61 use cases spanning genomics, imaging, clinical phenotyping, EHR-driven analytics, and precision medicine. Common themes included the need for multi-modal data integration, HL7 FHIR-based secure access, federated model training without PII retention, and containerized microservices for scalable deployment. Convergent needs across institutions emphasized consistent demand for FAIR-compliant infrastructure and readiness for real-world data analytics.

Conclusion: The FEAST Use Cases survey provides a cross-sectional view of biomedical informatics priorities grounded in real-world data needs. The findings offer a strategic blueprint for developing federated, privacy-preserving infrastructure to support secure, collaborative, and scalable biomedical research.

目标:分析和标准化技术联邦生态系统(FEAST)是一个模块化、基于云的平台,通过ARPA-H生物医学数据结构计划开发,实现对现实世界生物医学数据的安全、联邦分析。为了指导和迭代改进其模块化设计,FEAST团队进行了一项跨机构调查,以系统地识别和优先考虑与不同生物医学领域授权访问数据相关的研究需求。本研究展示了提交用例的结构化综合,以揭示基础设施差距、数据集成挑战和转化机会。调查结果告知了前端面向用户的功能和后端数据需求,塑造了接口如何支持用户交互、数据类型以及对安全性和互操作性标准的遵从性。材料和方法:向参与机构的研究人员分发了一份结构化的调查表格,这些机构包括DNA-HIVE、乔治华盛顿大学(GW-FEAST)、威尔康奈尔医学、范德比尔特大学医学中心、乔治城大学、欧洲生物信息学研究所和凯撒医疗机构。受访者完成了描述感兴趣的数据类型、项目目标、分析方法和感知到的技术障碍的标准化字段。收集到的响应经过整理和分析,以确定与隐私、互操作性、可伸缩性和工作流再现性相关的共同需求。结果:该调查汇编了61个用例,涵盖基因组学、成像、临床表型、ehr驱动分析和精准医学。常见的主题包括对多模态数据集成的需求、基于HL7 fir的安全访问、不保留PII的联邦模型训练,以及用于可伸缩部署的容器化微服务。跨机构的融合需求强调了对符合fair标准的基础设施的一致需求,并为现实世界的数据分析做好准备。结论:FEAST用例调查提供了基于现实世界数据需求的生物医学信息学优先级的横断面视图。研究结果为开发联邦、隐私保护基础设施提供了战略蓝图,以支持安全、协作和可扩展的生物医学研究。
{"title":"From use cases to infrastructure: a cross-institutional survey of priorities in data-driven biomedical research.","authors":"Raja Mazumder, Jonathon Keeney, Luke Johnson, Lori Krammer, Patrick McNeely, Jorge Sepulveda, Danielle Hangen, Maria Martin, Dushyanth Jyothi, Jonas De Almeida, Peter McGarvey, Adil Alaoui, Sarah Cha, Art Sedrakyan, Evan Shoelle, Michael Matheny, Michele LeNoue-Newton, Robert Winter, Stephen Deppen, Vahan Simonyan, Anelia Horvath","doi":"10.1093/jamia/ocag001","DOIUrl":"https://doi.org/10.1093/jamia/ocag001","url":null,"abstract":"<p><strong>Objectives: </strong>Federated Ecosystems for Analytics and Standardized Technologies (FEAST) is a modular, cloud-based platform developed through the ARPA-H Biomedical Data Fabric initiative to enable secure, federated analysis of real-world biomedical data. To guide and iteratively refine its modular design, the FEAST team conducted a cross-institutional survey to systematically identify and prioritize research needs related to authorized-access data across diverse biomedical domains. This study presents a structured synthesis of submitted use cases to uncover infrastructure gaps, data integration challenges, and translational opportunities. The results from the survey inform both front-end user-facing functionality and backend data requirements, shaping how the interface supports user interactions, data types, and compliance with security and interoperability standards.</p><p><strong>Materials and methods: </strong>A structured survey form was distributed to researchers affiliated with participating institutions, including DNA-HIVE, The George Washington University (GW-FEAST), Weill Cornell Medicine, Vanderbilt University Medical Center, Georgetown University, European Bioinformatics Institute, and Kaiser Permanente. Respondents completed standardized fields describing the data types of interest, project goals, analytic methods, and perceived technical barriers. The collected responses were curated and analyzed to identify common needs related to privacy, interoperability, scalability, and workflow reproducibility.</p><p><strong>Results: </strong>The survey compiled 61 use cases spanning genomics, imaging, clinical phenotyping, EHR-driven analytics, and precision medicine. Common themes included the need for multi-modal data integration, HL7 FHIR-based secure access, federated model training without PII retention, and containerized microservices for scalable deployment. Convergent needs across institutions emphasized consistent demand for FAIR-compliant infrastructure and readiness for real-world data analytics.</p><p><strong>Conclusion: </strong>The FEAST Use Cases survey provides a cross-sectional view of biomedical informatics priorities grounded in real-world data needs. The findings offer a strategic blueprint for developing federated, privacy-preserving infrastructure to support secure, collaborative, and scalable biomedical research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Positive act of reporting negative results in large language model research: a call for transparency. 在大型语言模型研究中报告负面结果的积极行为:呼吁透明度。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-19 DOI: 10.1093/jamia/ocaf221
Satvik Tripathi, Dana Alkhulaifat, Tessa S Cook

Purpose: To highlight the importance of reporting negative results in large language model (LLM) research, particularly as these systems are increasingly integrated into healthcare.

Potential: LLMs offer transformative capabilities in text generation, summarization, and clinical decision support. Transparent documentation of both successes and failures can accelerate innovation, improve reproducibility, and guide safe deployment.

Caution: Publication bias toward positive findings conceals model limitations, biases, and reproducibility challenges. In healthcare, underreporting failures risks patient safety, ethical lapses, and wasted resources. Structural barriers, including a lack of standards and limited funding for failure analysis, perpetuate this cycle.

Conclusions: Negative results should be recognized as valuable contributions that delineate the boundaries of LLM applicability. Structured reporting, educational initiatives, and stronger incentives for transparency are essential to ensure responsible, equitable, and trustworthy use of LLMs in healthcare.

目的:强调在大型语言模型(LLM)研究中报告负面结果的重要性,特别是当这些系统越来越多地集成到医疗保健中时。潜力:法学硕士提供文本生成、摘要和临床决策支持的变革性能力。成功和失败的透明文档可以加速创新,提高再现性,并指导安全部署。警告:发表偏向于正面发现隐藏了模型的局限性、偏倚和可重复性的挑战。在医疗保健领域,漏报失败会给患者安全、道德缺失和资源浪费带来风险。结构性障碍,包括缺乏标准和有限的资金用于故障分析,使这种循环永久化。结论:阴性结果应被视为划定法学硕士适用性界限的有价值的贡献。结构化报告、教育举措和更强有力的透明度激励措施对于确保在医疗保健领域负责任、公平和可信地使用法学硕士至关重要。
{"title":"Positive act of reporting negative results in large language model research: a call for transparency.","authors":"Satvik Tripathi, Dana Alkhulaifat, Tessa S Cook","doi":"10.1093/jamia/ocaf221","DOIUrl":"https://doi.org/10.1093/jamia/ocaf221","url":null,"abstract":"<p><strong>Purpose: </strong>To highlight the importance of reporting negative results in large language model (LLM) research, particularly as these systems are increasingly integrated into healthcare.</p><p><strong>Potential: </strong>LLMs offer transformative capabilities in text generation, summarization, and clinical decision support. Transparent documentation of both successes and failures can accelerate innovation, improve reproducibility, and guide safe deployment.</p><p><strong>Caution: </strong>Publication bias toward positive findings conceals model limitations, biases, and reproducibility challenges. In healthcare, underreporting failures risks patient safety, ethical lapses, and wasted resources. Structural barriers, including a lack of standards and limited funding for failure analysis, perpetuate this cycle.</p><p><strong>Conclusions: </strong>Negative results should be recognized as valuable contributions that delineate the boundaries of LLM applicability. Structured reporting, educational initiatives, and stronger incentives for transparency are essential to ensure responsible, equitable, and trustworthy use of LLMs in healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contextualizing key principles to promote a justice-oriented informatics research agenda: proceedings and reflections from an American Medical Informatics Association workshop. 促进公正信息学研究议程的关键原则:美国医学信息学协会研讨会的会议记录和反思。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-19 DOI: 10.1093/jamia/ocaf210
Aparajita Kashyap, Christopher J Allsman, Elizabeth A Campbell, Pooja M Desai, Salvatore G Volpe, Bria P Massey, Tiffani J Bright, Suzanne Bakken, Oliver J Bear Don't Walk Iv, Adrienne Pichon

Objectives: Advancing health through informatics requires attending to justice. Recent policy changes in the United States have introduced significant barriers to promoting justice within informatics due to targeted funding cuts and hostility to science, especially science that prioritizes justice.

Materials and methods: We present five key principles for advancing a justice-oriented informatics agenda, synthesized from our workshop held at the American Medical Informatics Association 2022 Annual Symposium.

Results: These principles are: (1) Recognize knowledge and methodologies across communities; (2) Acknowledge historical and cultural contexts of interactions; (3) Facilitate transparency and accountability through clear measures and metrics; (4) Foster trust and sustainability; and (5) Equitably allocate compensation and resources.

Discussion and conclusion: We discuss barriers to implementing these principles that have arisen since the 2022 workshop and provide recommendations for moving towards justice-oriented informatics. We offer examples of how these principles may be used to frame challenges and adapt to new barriers within BMI.

目标:通过信息学促进健康需要关注正义。美国最近的政策变化,由于有针对性的资金削减和对科学的敌意,特别是对优先考虑正义的科学的敌意,给促进信息学内部的正义带来了重大障碍。材料和方法:我们提出了推进以正义为导向的信息学议程的五个关键原则,综合了我们在美国医学信息学协会2022年年度研讨会上举行的研讨会。结果:这些原则是:(1)识别跨社区的知识和方法;(2)承认互动的历史和文化背景;(3)通过明确的措施和指标促进透明度和问责制;(4)培养信任和可持续性;(5)公平分配薪酬和资源。讨论和结论:我们讨论了自2022年研讨会以来出现的实施这些原则的障碍,并提供了向面向正义的信息学发展的建议。我们提供了一些例子,说明如何使用这些原则来构建挑战并适应BMI中的新障碍。
{"title":"Contextualizing key principles to promote a justice-oriented informatics research agenda: proceedings and reflections from an American Medical Informatics Association workshop.","authors":"Aparajita Kashyap, Christopher J Allsman, Elizabeth A Campbell, Pooja M Desai, Salvatore G Volpe, Bria P Massey, Tiffani J Bright, Suzanne Bakken, Oliver J Bear Don't Walk Iv, Adrienne Pichon","doi":"10.1093/jamia/ocaf210","DOIUrl":"https://doi.org/10.1093/jamia/ocaf210","url":null,"abstract":"<p><strong>Objectives: </strong>Advancing health through informatics requires attending to justice. Recent policy changes in the United States have introduced significant barriers to promoting justice within informatics due to targeted funding cuts and hostility to science, especially science that prioritizes justice.</p><p><strong>Materials and methods: </strong>We present five key principles for advancing a justice-oriented informatics agenda, synthesized from our workshop held at the American Medical Informatics Association 2022 Annual Symposium.</p><p><strong>Results: </strong>These principles are: (1) Recognize knowledge and methodologies across communities; (2) Acknowledge historical and cultural contexts of interactions; (3) Facilitate transparency and accountability through clear measures and metrics; (4) Foster trust and sustainability; and (5) Equitably allocate compensation and resources.</p><p><strong>Discussion and conclusion: </strong>We discuss barriers to implementing these principles that have arisen since the 2022 workshop and provide recommendations for moving towards justice-oriented informatics. We offer examples of how these principles may be used to frame challenges and adapt to new barriers within BMI.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependence of premature ventricular complexes on heart rate-it's not that simple. 早衰心室复合体对心率的依赖——没那么简单。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1093/jamia/ocaf069
Adrien Osakwe, Noah Wightman, Marc W Deyell, Zachary Laksman, Alvin Shrier, Gil Bub, Leon Glass, Thomas M Bury

Objective: Frequent premature ventricular complexes (PVCs) can lead to adverse health conditions such as cardiomyopathy. The linear correlation between PVC frequency and heart rate (as positive, negative, or neutral) on a 24-hour Holter recording has been proposed as a way to classify patients and guide treatment with beta-blockers. Our objective was to evaluate the robustness of this classification to measurement methodology, different 24-hour periods, and nonlinear dependencies of PVCs on heart rate.

Materials and methods: We analyzed 82 multi-day Holter recordings (1-7 days) collected from 48 patients with frequent PVCs (burden 1%-44%). For each record, linear correlation between PVC frequency and heart rate was computed for different 24-hour periods and using different length intervals to determine PVC frequency.

Results: Using a 1-hour interval, the correlation between PVC frequency and heart rate was consistently positive, negative, or neutral on different days in only 36.6% of patients. Using shorter time intervals, the correlation was consistent in 56.1% of patients. Shorter time intervals revealed nonlinear and piecewise linear relationships between PVC frequency and heart rate in many patients.

Discussion: The variability of the correlation between PVC frequency and heart rate across different 24-hour periods and interval durations suggests that the relationship is neither strictly linear nor stationary. A better understanding of the mechanism driving the PVCs, combined with computational and biological models that represent these mechanisms, may provide insight into the observed nonlinear behavior and guide more robust classification strategies.

Conclusion: Linear correlation as a tool to classify patients with frequent PVCs should be used with caution. It is sensitive to the specific 24-hour period analyzed and the methodology used to segment the data. More sophisticated classification approaches that can capture nonlinear and time-varying dependencies should be developed and considered in clinical practice.

目的:频繁的室性早搏可导致不良的健康状况,如心肌病。在24小时动态心电图记录中,PVC频率与心率(阳性、阴性或中性)之间的线性相关性已被提出作为对患者进行分类和指导β受体阻滞剂治疗的一种方法。我们的目的是评估这种分类对测量方法、不同的24小时周期和室性早搏对心率的非线性依赖性的稳健性。材料和方法:我们分析了48例频繁室性早搏患者(负担1%-44%)的82天动态心电图记录(1-7天)。对于每一个记录,在不同的24小时周期内计算PVC频率与心率之间的线性相关性,并使用不同的长度间隔来确定PVC频率。结果:使用1小时的间隔,只有36.6%的患者在不同的日子里,PVC频率和心率之间的相关性始终为正、负或中性。使用较短的时间间隔,56.1%的患者的相关性是一致的。较短的时间间隔揭示了许多患者PVC频率与心率之间的非线性和分段线性关系。讨论:在不同的24小时周期和间隔时间内,PVC频率和心率之间的相关性的可变性表明,这种关系既不是严格的线性关系,也不是平稳的关系。更好地理解驱动pvc的机制,结合代表这些机制的计算和生物学模型,可以提供对观察到的非线性行为的洞察,并指导更稳健的分类策略。结论:线性相关性作为诊断频发室性早搏的工具应谨慎使用。它对分析的特定24小时期间和用于分割数据的方法很敏感。应该在临床实践中开发和考虑更复杂的分类方法,这些方法可以捕获非线性和时变的依赖关系。
{"title":"Dependence of premature ventricular complexes on heart rate-it's not that simple.","authors":"Adrien Osakwe, Noah Wightman, Marc W Deyell, Zachary Laksman, Alvin Shrier, Gil Bub, Leon Glass, Thomas M Bury","doi":"10.1093/jamia/ocaf069","DOIUrl":"10.1093/jamia/ocaf069","url":null,"abstract":"<p><strong>Objective: </strong>Frequent premature ventricular complexes (PVCs) can lead to adverse health conditions such as cardiomyopathy. The linear correlation between PVC frequency and heart rate (as positive, negative, or neutral) on a 24-hour Holter recording has been proposed as a way to classify patients and guide treatment with beta-blockers. Our objective was to evaluate the robustness of this classification to measurement methodology, different 24-hour periods, and nonlinear dependencies of PVCs on heart rate.</p><p><strong>Materials and methods: </strong>We analyzed 82 multi-day Holter recordings (1-7 days) collected from 48 patients with frequent PVCs (burden 1%-44%). For each record, linear correlation between PVC frequency and heart rate was computed for different 24-hour periods and using different length intervals to determine PVC frequency.</p><p><strong>Results: </strong>Using a 1-hour interval, the correlation between PVC frequency and heart rate was consistently positive, negative, or neutral on different days in only 36.6% of patients. Using shorter time intervals, the correlation was consistent in 56.1% of patients. Shorter time intervals revealed nonlinear and piecewise linear relationships between PVC frequency and heart rate in many patients.</p><p><strong>Discussion: </strong>The variability of the correlation between PVC frequency and heart rate across different 24-hour periods and interval durations suggests that the relationship is neither strictly linear nor stationary. A better understanding of the mechanism driving the PVCs, combined with computational and biological models that represent these mechanisms, may provide insight into the observed nonlinear behavior and guide more robust classification strategies.</p><p><strong>Conclusion: </strong>Linear correlation as a tool to classify patients with frequent PVCs should be used with caution. It is sensitive to the specific 24-hour period analyzed and the methodology used to segment the data. More sophisticated classification approaches that can capture nonlinear and time-varying dependencies should be developed and considered in clinical practice.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"90-97"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144055982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing end-stage renal disease outcome prediction: a multisourced data-driven approach. 加强终末期肾脏疾病结局预测:多来源数据驱动的方法
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1093/jamia/ocaf118
Yubo Li, Rema Padman

Objectives: To improve prediction of chronic kidney disease (CKD) progression to end-stage renal disease (ESRD) using machine learning (ML) and deep learning (DL) models applied to integrated clinical and claims data with varying observation windows, supported by explainable artificial intelligence (AI) to enhance interpretability and reduce bias.

Materials and methods: We utilized data from 10 326 CKD patients, combining clinical and claims information from 2009 to 2018. After preprocessing, cohort identification, and feature engineering, we evaluated multiple statistical, ML and DL models using 5 distinct observation windows. Feature importance and SHapley Additive exPlanations (SHAP) analysis were employed to understand key predictors. Models were tested for robustness, clinical relevance, misclassification patterns, and bias.

Results: Integrated data models outperformed single data source models, with long short-term memory achieving the highest area under the receiver operating characteristic curve (AUROC) (0.93) and F1 score (0.65). A 24-month observation window optimally balanced early detection and prediction accuracy. The 2021 estimated glomerular filtration rate (eGFR) equation improved prediction accuracy and reduced racial bias, particularly for African American patients.

Discussion: Improved prediction accuracy, interpretability, and bias mitigation strategies have the potential to enhance CKD management, support targeted interventions, and reduce health-care disparities.

Conclusion: This study presents a robust framework for predicting ESRD outcomes, improving clinical decision-making through integrated multisourced data and advanced analytics. Future research will expand data integration and extend this framework to other chronic diseases.

目的:利用机器学习(ML)和深度学习(DL)模型,提高慢性肾脏疾病(CKD)进展到终末期肾脏疾病(ESRD)的预测,这些模型应用于具有不同观察窗口的综合临床和索赔数据,并得到可解释的人工智能(AI)的支持,以增强可解释性并减少偏差。材料和方法:我们利用2009年至2018年10 326例CKD患者的数据,结合临床和索赔信息。经过预处理、队列识别和特征工程,我们使用5个不同的观察窗口评估了多个统计、ML和DL模型。采用特征重要性和SHapley加性解释(SHAP)分析来了解关键预测因子。对模型进行稳健性、临床相关性、错误分类模式和偏倚检验。结果:综合数据模型优于单一数据源模型,长短期记忆在受试者工作特征曲线下面积(AUROC)最高(0.93),F1得分最高(0.65)。24个月的观测窗口最佳地平衡了早期发现和预测精度。2021年估计的肾小球滤过率(eGFR)方程提高了预测准确性,减少了种族偏见,特别是对非洲裔美国患者。讨论:提高预测准确性、可解释性和减轻偏倚策略有可能加强CKD管理,支持有针对性的干预措施,并减少医疗保健差距。结论:本研究为预测ESRD结果提供了一个强大的框架,通过集成多源数据和高级分析改善临床决策。未来的研究将扩大数据整合,并将这一框架扩展到其他慢性疾病。
{"title":"Enhancing end-stage renal disease outcome prediction: a multisourced data-driven approach.","authors":"Yubo Li, Rema Padman","doi":"10.1093/jamia/ocaf118","DOIUrl":"10.1093/jamia/ocaf118","url":null,"abstract":"<p><strong>Objectives: </strong>To improve prediction of chronic kidney disease (CKD) progression to end-stage renal disease (ESRD) using machine learning (ML) and deep learning (DL) models applied to integrated clinical and claims data with varying observation windows, supported by explainable artificial intelligence (AI) to enhance interpretability and reduce bias.</p><p><strong>Materials and methods: </strong>We utilized data from 10 326 CKD patients, combining clinical and claims information from 2009 to 2018. After preprocessing, cohort identification, and feature engineering, we evaluated multiple statistical, ML and DL models using 5 distinct observation windows. Feature importance and SHapley Additive exPlanations (SHAP) analysis were employed to understand key predictors. Models were tested for robustness, clinical relevance, misclassification patterns, and bias.</p><p><strong>Results: </strong>Integrated data models outperformed single data source models, with long short-term memory achieving the highest area under the receiver operating characteristic curve (AUROC) (0.93) and F1 score (0.65). A 24-month observation window optimally balanced early detection and prediction accuracy. The 2021 estimated glomerular filtration rate (eGFR) equation improved prediction accuracy and reduced racial bias, particularly for African American patients.</p><p><strong>Discussion: </strong>Improved prediction accuracy, interpretability, and bias mitigation strategies have the potential to enhance CKD management, support targeted interventions, and reduce health-care disparities.</p><p><strong>Conclusion: </strong>This study presents a robust framework for predicting ESRD outcomes, improving clinical decision-making through integrated multisourced data and advanced analytics. Future research will expand data integration and extend this framework to other chronic diseases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"26-36"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758457/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SDoH-GPT: using large language models to extract social determinants of health. SDoH-GPT:使用大型语言模型提取健康的社会决定因素。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1093/jamia/ocaf094
Bernardo Consoli, Haoyang Wang, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

Objective: Extracting social determinants of health (SDoHs) from medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. Here, we introduce SDoH-GPT, a novel framework leveraging few-shot learning large language models (LLMs) to automate the extraction of SDoH from unstructured text, aiming to improve both efficiency and generalizability.

Materials and methods: SDoH-GPT is a framework including the few-shot learning LLM methods to extract the SDoH from medical notes and the XGBoost classifiers which continue to classify SDoH using the annotations generated by the few-shot learning LLM methods as training datasets. The unique combination of the few-shot learning LLM methods with XGBoost utilizes the strength of LLMs as great few shot learners and the efficiency of XGBoost when the training dataset is sufficient. Therefore, SDoH-GPT can extract SDoH without relying on extensive medical annotations or costly human intervention.

Results: Our approach achieved tenfold and twentyfold reductions in time and cost, respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of LLM and XGBoost can ensure high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores.

Discussion: This study has verified SDoH-GPT on three datasets and highlights the potential of leveraging LLM and XGBoost to revolutionize medical note classification, demonstrating its capability to achieve highly accurate classifications with significantly reduced time and cost.

Conclusion: The key contribution of this study is the integration of LLM with XGBoost, which enables cost-effective and high quality annotations of SDoH. This research sets the stage for SDoH can be more accessible, scalable, and impactful in driving future healthcare solutions.

目的:从医疗记录中提取健康的社会决定因素(SDoHs)在很大程度上依赖于劳动密集型的注释,这些注释通常是特定于任务的,阻碍了可重用性并限制了共享。在这里,我们介绍了SDoH- gpt,这是一个利用少量学习大型语言模型(llm)从非结构化文本中自动提取SDoH的新框架,旨在提高效率和泛化性。材料和方法:SDoH- gpt是一个框架,包括从医疗记录中提取SDoH的few-shot learning LLM方法,以及使用few-shot learning LLM方法生成的注释作为训练数据集继续对SDoH进行分类的XGBoost分类器。少镜头学习LLM方法与XGBoost的独特结合利用了LLM作为少镜头学习器的强度和XGBoost在训练数据集足够时的效率。因此,SDoH- gpt可以在不依赖大量医学注释或昂贵的人为干预的情况下提取SDoH。结果:我们的方法在时间和成本上分别减少了10倍和20倍,并且与人类注释器的一致性非常好,Cohen的kappa测量值高达0.92。LLM和XGBoost的创新组合可以确保高精度和计算效率,同时始终保持0.90+ AUROC分数。讨论:本研究在三个数据集上验证了SDoH-GPT,并强调了利用LLM和XGBoost彻底改变医疗记录分类的潜力,展示了其在显著减少时间和成本的情况下实现高度准确分类的能力。结论:本研究的关键贡献在于LLM与XGBoost的集成,实现了高成本、高质量的SDoH注释。这项研究为SDoH在推动未来医疗保健解决方案方面更易于访问、可扩展和更有影响力奠定了基础。
{"title":"SDoH-GPT: using large language models to extract social determinants of health.","authors":"Bernardo Consoli, Haoyang Wang, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding","doi":"10.1093/jamia/ocaf094","DOIUrl":"10.1093/jamia/ocaf094","url":null,"abstract":"<p><strong>Objective: </strong>Extracting social determinants of health (SDoHs) from medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. Here, we introduce SDoH-GPT, a novel framework leveraging few-shot learning large language models (LLMs) to automate the extraction of SDoH from unstructured text, aiming to improve both efficiency and generalizability.</p><p><strong>Materials and methods: </strong>SDoH-GPT is a framework including the few-shot learning LLM methods to extract the SDoH from medical notes and the XGBoost classifiers which continue to classify SDoH using the annotations generated by the few-shot learning LLM methods as training datasets. The unique combination of the few-shot learning LLM methods with XGBoost utilizes the strength of LLMs as great few shot learners and the efficiency of XGBoost when the training dataset is sufficient. Therefore, SDoH-GPT can extract SDoH without relying on extensive medical annotations or costly human intervention.</p><p><strong>Results: </strong>Our approach achieved tenfold and twentyfold reductions in time and cost, respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of LLM and XGBoost can ensure high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores.</p><p><strong>Discussion: </strong>This study has verified SDoH-GPT on three datasets and highlights the potential of leveraging LLM and XGBoost to revolutionize medical note classification, demonstrating its capability to achieve highly accurate classifications with significantly reduced time and cost.</p><p><strong>Conclusion: </strong>The key contribution of this study is the integration of LLM with XGBoost, which enables cost-effective and high quality annotations of SDoH. This research sets the stage for SDoH can be more accessible, scalable, and impactful in driving future healthcare solutions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"67-78"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758468/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144267837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1