首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
The Potential Implications of Informatics for Value-Based Bare. 信息学对基于价值的裸的潜在影响。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-19 DOI: 10.1093/jamia/ocag009
Chen Dun, Caitlin W Hicks, Harold P Lehmann
<p><strong>Objective: </strong>Value-based care (VBC) represents a fundamental shift from volume-driven reimbursement to models focused on improving patient outcomes and reducing costs. Informatics plays an essential, but often underappreciated, role in enabling VBC. Traditional discussions of informatics emphasize data and technology; however, a broader sociotechnical view highlights how people, organizations, workflows, and policies interact with technology to influence the success of VBC initiatives. In this article, we apply the Informatics Stack as a heuristic framework to examine how informatics shapes VBC across 4 phases: research, policy setting, healthcare implementation, and local assessment within learning health systems.</p><p><strong>Materials and methods: </strong>We applied the Informatics Stack as a heuristic framework to analyze VBC across four phases: research, policy setting, healthcare implementation, and local assessment. To provide a grounded analysis, the study focused on the Healthcare Implementation phase, utilizing vascular claudication management as a primary illustrative case to demonstrate how high-level VBC policies are converted into granular clinical workflows and algorithms.</p><p><strong>Results: </strong>We present "As-Is" characterizations of informatics in VBC at multiple levels of the Stack, ranging from world-level regulatory forces to organizational values, to business processes, workflows, information systems, modules, algorithms, data, and underlying technologies. We also outline "To-Be" opportunities, including computable clinical guidelines, interoperable data platforms, algorithm performance monitoring, and integration of multimodal data streams into decision support. To provide a grounded analysis, we narrow our focus to the Healthcare Implementation phase, using vascular claudication management as our primary illustrative case. Managing claudication in a VBC model requires preventing low-value care, such as early, aggressive peripheral vascular interventions, while optimizing patient-specific outcomes. We will used this clinical example to walk down the levels of the Stack, demonstrating how informatics converts high-level VBC policy into granular clinical workflows and algorithm.</p><p><strong>Discussion: </strong>In this article, we apply the Informatics Stack as a heuristic framework to examine how informatics shapes VBC across four phases, specifically focusing on the Healthcare Implementation phase using vascular claudication management as an illustrative case. We present "As-Is" characterizations of the current state of informatics alongside "To-Be" opportunities, including computable clinical guidelines, interoperable data platforms, and algorithm performance monitoring. Concluding that VBC requires a socio-technical perspective beyond mere data and technology, we propose the Stack as a diagnostic tool for health leaders and offer a "VBC Informatics Gap Analysis Toolkit" to help organizations
目的:基于价值的护理(VBC)代表了一个根本性的转变,从数量驱动的报销模式侧重于改善患者的结果和降低成本。信息学在实现VBC方面起着至关重要的作用,但往往被低估。传统的信息学讨论强调数据和技术;然而,更广泛的社会技术观点强调了人员、组织、工作流和策略如何与技术相互作用以影响VBC计划的成功。在本文中,我们应用信息学堆栈作为启发式框架来研究信息学如何在四个阶段塑造VBC:研究、政策制定、医疗保健实施和学习卫生系统中的本地评估。材料和方法:我们应用信息学堆栈作为启发式框架,跨四个阶段分析VBC:研究、政策制定、医疗保健实施和本地评估。为了提供有根据的分析,本研究将重点放在医疗保健实施阶段,利用血管跛行管理作为主要的说明案例,演示如何将高级VBC策略转换为精细的临床工作流程和算法。结果:我们在Stack的多个层次上呈现了VBC中信息学的“As-Is”特征,从世界级别的监管力量到组织价值,再到业务流程、工作流、信息系统、模块、算法、数据和底层技术。我们还概述了“将来”的机会,包括可计算的临床指南、可互操作的数据平台、算法性能监控以及将多模式数据流集成到决策支持中。为了提供有根据的分析,我们将重点缩小到医疗保健实施阶段,使用血管性跛行管理作为我们的主要说明性案例。在VBC模型中管理跛行需要预防低价值护理,例如早期,积极的周围血管干预,同时优化患者特异性结果。我们将使用这个临床示例来沿着堆栈的级别向下走,演示信息学如何将高级VBC策略转换为粒度临床工作流和算法。讨论:在本文中,我们将信息学堆栈作为启发式框架应用于研究信息学如何跨四个阶段塑造VBC,特别关注医疗保健实施阶段,并使用血管跛行管理作为说明性案例。我们提出了信息学当前状态的“现状”特征以及“未来”机会,包括可计算的临床指南、可互操作的数据平台和算法性能监控。结论是,VBC需要超越单纯的数据和技术的社会技术视角,我们建议将堆栈作为卫生领导者的诊断工具,并提供“VBC信息学差距分析工具包”,以帮助组织确定其实施战略中的一致性差距。结论:在信息学教学中,在对问题或领域在任何时间点的位置进行评估时,我们发现Stack可以帮助学生和许多校友在他们的工作中应用。然而,对于试图实施VBC的卫生系统领导者来说,堆栈必须不仅仅是一个理论模型;它一定是一个诊断工具。它没有规定具体的软件购买,而是向组织指出他们在信息学基础设施的概念中可能存在不一致的地方,或者在概念化中存在明显的差距。
{"title":"The Potential Implications of Informatics for Value-Based Bare.","authors":"Chen Dun, Caitlin W Hicks, Harold P Lehmann","doi":"10.1093/jamia/ocag009","DOIUrl":"https://doi.org/10.1093/jamia/ocag009","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;Value-based care (VBC) represents a fundamental shift from volume-driven reimbursement to models focused on improving patient outcomes and reducing costs. Informatics plays an essential, but often underappreciated, role in enabling VBC. Traditional discussions of informatics emphasize data and technology; however, a broader sociotechnical view highlights how people, organizations, workflows, and policies interact with technology to influence the success of VBC initiatives. In this article, we apply the Informatics Stack as a heuristic framework to examine how informatics shapes VBC across 4 phases: research, policy setting, healthcare implementation, and local assessment within learning health systems.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Materials and methods: &lt;/strong&gt;We applied the Informatics Stack as a heuristic framework to analyze VBC across four phases: research, policy setting, healthcare implementation, and local assessment. To provide a grounded analysis, the study focused on the Healthcare Implementation phase, utilizing vascular claudication management as a primary illustrative case to demonstrate how high-level VBC policies are converted into granular clinical workflows and algorithms.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;We present \"As-Is\" characterizations of informatics in VBC at multiple levels of the Stack, ranging from world-level regulatory forces to organizational values, to business processes, workflows, information systems, modules, algorithms, data, and underlying technologies. We also outline \"To-Be\" opportunities, including computable clinical guidelines, interoperable data platforms, algorithm performance monitoring, and integration of multimodal data streams into decision support. To provide a grounded analysis, we narrow our focus to the Healthcare Implementation phase, using vascular claudication management as our primary illustrative case. Managing claudication in a VBC model requires preventing low-value care, such as early, aggressive peripheral vascular interventions, while optimizing patient-specific outcomes. We will used this clinical example to walk down the levels of the Stack, demonstrating how informatics converts high-level VBC policy into granular clinical workflows and algorithm.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Discussion: &lt;/strong&gt;In this article, we apply the Informatics Stack as a heuristic framework to examine how informatics shapes VBC across four phases, specifically focusing on the Healthcare Implementation phase using vascular claudication management as an illustrative case. We present \"As-Is\" characterizations of the current state of informatics alongside \"To-Be\" opportunities, including computable clinical guidelines, interoperable data platforms, and algorithm performance monitoring. Concluding that VBC requires a socio-technical perspective beyond mere data and technology, we propose the Stack as a diagnostic tool for health leaders and offer a \"VBC Informatics Gap Analysis Toolkit\" to help organizations ","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147488180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of a clinical decision support tool for hypertension in a rural health system. 在农村卫生系统中实施高血压临床决策支持工具。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-17 DOI: 10.1093/jamia/ocag031
Elyse O Kharbanda, Stephen E Asche, Inih Essien, Clayton I Allen, Laura A Freitag, Heidi L Ekstrom, Kay A Kromrey, Abhilash Muthineni, Daniel M Saman, Vijayakumar Thirumalai, Patrick J O'Connor, Catherine P Benziger

Objectives: Elevated blood pressure (BP) and hypertension are often overlooked in pediatric care. We adapted a pediatric hypertension clinical decision support (CDS) for a primarily rural health system and compared CDS impact across varied implementation approaches.

Methods: In this cluster randomized trial, 40 primary care clinics were randomized 1:1:1 to CDS with high-intensity implementation, CDS with low-intensity implementation, or usual care (UC). Low-intensity implementation was limited to online training. High-intensity CDS implementation included in-person and online training, monthly check-ins and feedback regarding CDS use. Patients 6-17 years with BP measured at a primary care visit from August 1, 2022 to January 31, 2024 were eligible. Outcomes were remeasurement of elevated BP during a visit and recognition of hypertension within 6 months of meeting criteria. Analyses adjusted for clustered study design and patient characteristics.

Results: Of 9155 patients with an elevated BP, remeasurement during the visit occurred for 51.5% at high-intensity, 23.6% at low-intensity, and 6.2% at UC clinics. Among 578 patients with incident hypertension, recognition was 42.8% at high-intensity, 24.5% at low-intensity and 14.4% at UC clinics. Patients attending high or low-intensity CDS clinics were more likely than those at UC to have elevated BP remeasured (adjusted odds ratio [aOR] 8.70; 95% CI 5.68-13.3) and to have their hypertension clinically recognized (aOR 2.94; 1.00-8.60). High-intensity implementation was more effective than low-intensity implementation for repeat BP measurement (aOR 3.45; 1.88-6.33) and hypertension recognition (aOR 2.31; 1.08-4.98).

Conclusions: CDS improved pediatric BP care in a primarily rural health system while effectiveness varied by implementation approach.

目的:在儿科护理中,血压升高和高血压经常被忽视。我们将儿童高血压临床决策支持(CDS)应用于主要的农村卫生系统,并比较了CDS在不同实施方法中的影响。方法:在这个集群随机试验中,40个初级保健诊所以1:1:1的比例随机分为高强度实施CDS、低强度实施CDS或常规护理(UC)。低强度的实施仅限于在线培训。高强度的CDS实施包括面对面和在线培训,每月检查和关于CDS使用的反馈。在2022年8月1日至2024年1月31日期间在初级保健就诊的6-17岁血压患者符合条件。结果是在就诊期间重新测量血压升高,并在符合标准的6个月内确认高血压。分析调整了聚类研究设计和患者特征。结果:在9155例血压升高的患者中,51.5%的患者在高强度下就诊,23.6%的患者在低强度下就诊,6.2%的患者在UC诊所就诊。在578例高血压患者中,高强度诊断率为42.8%,低强度诊断率为24.5%,UC诊断率为14.4%。与UC的患者相比,在高强度或低强度CDS诊所就诊的患者更有可能重新测量到血压升高(调整后的优势比[aOR] 8.70; 95% CI 5.68-13.3),并且他们的高血压得到临床诊断(aOR 2.94; 1.00-8.60)。在重复血压测量(aOR 3.45; 1.88-6.33)和高血压识别(aOR 2.31; 1.08-4.98)方面,高强度实施比低强度实施更有效。结论:CDS改善了主要是农村卫生系统的儿科BP护理,但效果因实施方法而异。
{"title":"Implementation of a clinical decision support tool for hypertension in a rural health system.","authors":"Elyse O Kharbanda, Stephen E Asche, Inih Essien, Clayton I Allen, Laura A Freitag, Heidi L Ekstrom, Kay A Kromrey, Abhilash Muthineni, Daniel M Saman, Vijayakumar Thirumalai, Patrick J O'Connor, Catherine P Benziger","doi":"10.1093/jamia/ocag031","DOIUrl":"https://doi.org/10.1093/jamia/ocag031","url":null,"abstract":"<p><strong>Objectives: </strong>Elevated blood pressure (BP) and hypertension are often overlooked in pediatric care. We adapted a pediatric hypertension clinical decision support (CDS) for a primarily rural health system and compared CDS impact across varied implementation approaches.</p><p><strong>Methods: </strong>In this cluster randomized trial, 40 primary care clinics were randomized 1:1:1 to CDS with high-intensity implementation, CDS with low-intensity implementation, or usual care (UC). Low-intensity implementation was limited to online training. High-intensity CDS implementation included in-person and online training, monthly check-ins and feedback regarding CDS use. Patients 6-17 years with BP measured at a primary care visit from August 1, 2022 to January 31, 2024 were eligible. Outcomes were remeasurement of elevated BP during a visit and recognition of hypertension within 6 months of meeting criteria. Analyses adjusted for clustered study design and patient characteristics.</p><p><strong>Results: </strong>Of 9155 patients with an elevated BP, remeasurement during the visit occurred for 51.5% at high-intensity, 23.6% at low-intensity, and 6.2% at UC clinics. Among 578 patients with incident hypertension, recognition was 42.8% at high-intensity, 24.5% at low-intensity and 14.4% at UC clinics. Patients attending high or low-intensity CDS clinics were more likely than those at UC to have elevated BP remeasured (adjusted odds ratio [aOR] 8.70; 95% CI 5.68-13.3) and to have their hypertension clinically recognized (aOR 2.94; 1.00-8.60). High-intensity implementation was more effective than low-intensity implementation for repeat BP measurement (aOR 3.45; 1.88-6.33) and hypertension recognition (aOR 2.31; 1.08-4.98).</p><p><strong>Conclusions: </strong>CDS improved pediatric BP care in a primarily rural health system while effectiveness varied by implementation approach.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Life events extraction from healthcare notes for veteran acute suicide risk prediction. 从医疗记录中提取生活事件用于退伍军人急性自杀风险预测。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-16 DOI: 10.1093/jamia/ocaf197
Destinee Morrow, Rafael Zamora-Resendiz, Sayera Dhaubhadel, Jean C Beckham, Nathan A Kimbrel, Benjamin H McMahon, Silvia Crivelli

Objective: Predictive models of suicide risk have focused on features extracted from structured data found in electronic health records, with limited consideration of predisposing life events (LE) expressed in unstructured clinical text such as housing instability and marital troubles. This study aims to expand upon previous research, demonstrating how high-performance computing (HPC) and machine learning methodologies can be used to extract and annotate 8 LE across all Veterans Health Administration (VHA) unstructured clinical text data with enriched performance metrics. Integration of the 8 LE with the structured features using different statistical and machine learning (ML) methods is also discussed.

Materials/methods: VHA-wide clinical text from January 2000 to January 2022 was pre-processed and analyzed using HPC. Data-driven lexicon curation enabled a rule-based annotator to extract LE, followed by machine learning for improved positive predictive value (PPV). NLP results were analyzed longitudinally and then integrated and compared to a baseline statistical model predicting risk for a combined outcome (suicide death, suicide attempt and overdose).

Results: First-time LE mentions showed a significant temporal correlation to suicide-related events (SRE) (suicide ideation, attempt and/or death) and are not associated with administrative bias. Predictive linear regression (LR) models integrating NLP-derived LE show an improved AUC of 0.81 and novel patient identification of up to 18%.

Discussion: Our analysis shows that these methodologies helped improve performance metrics significantly from previous work, while outperforming related works. These results demonstrated that NLP-derived LE served as acute predictors for SRE.

Conclusion: NLP integration into predictive models may help improve clinician decision support. Future work is necessary to better define and integrate these and other potential LE.

目的:自杀风险的预测模型侧重于从电子健康记录中提取的结构化数据特征,很少考虑非结构化临床文本中表达的易感生活事件(LE),如住房不稳定和婚姻问题。本研究旨在扩展先前的研究,展示如何使用高性能计算(HPC)和机器学习方法从所有退伍军人健康管理局(VHA)非结构化临床文本数据中提取和注释8 LE,并提供丰富的性能指标。本文还讨论了使用不同的统计和机器学习(ML)方法将8个LE与结构化特征集成。材料/方法:使用HPC对2000年1月至2022年1月的vha临床文本进行预处理和分析。数据驱动的词典管理使基于规则的注释器能够提取LE,然后通过机器学习来提高正预测值(PPV)。对NLP结果进行纵向分析,然后将其与预测综合结果(自杀死亡、自杀未遂和用药过量)风险的基线统计模型进行整合和比较。结果:首次提到LE与自杀相关事件(SRE)(自杀意念、企图和/或死亡)有显著的时间相关性,与行政偏见无关。整合nlp衍生LE的预测线性回归(LR)模型显示,AUC提高至0.81,新患者识别率高达18%。讨论:我们的分析表明,这些方法有助于显著改善先前工作的性能指标,同时优于相关工作。这些结果表明nlp衍生的LE是SRE的急性预测因子。结论:将NLP整合到预测模型中有助于提高临床医生的决策支持。未来的工作需要更好地定义和整合这些和其他潜在的LE。
{"title":"Life events extraction from healthcare notes for veteran acute suicide risk prediction.","authors":"Destinee Morrow, Rafael Zamora-Resendiz, Sayera Dhaubhadel, Jean C Beckham, Nathan A Kimbrel, Benjamin H McMahon, Silvia Crivelli","doi":"10.1093/jamia/ocaf197","DOIUrl":"https://doi.org/10.1093/jamia/ocaf197","url":null,"abstract":"<p><strong>Objective: </strong>Predictive models of suicide risk have focused on features extracted from structured data found in electronic health records, with limited consideration of predisposing life events (LE) expressed in unstructured clinical text such as housing instability and marital troubles. This study aims to expand upon previous research, demonstrating how high-performance computing (HPC) and machine learning methodologies can be used to extract and annotate 8 LE across all Veterans Health Administration (VHA) unstructured clinical text data with enriched performance metrics. Integration of the 8 LE with the structured features using different statistical and machine learning (ML) methods is also discussed.</p><p><strong>Materials/methods: </strong>VHA-wide clinical text from January 2000 to January 2022 was pre-processed and analyzed using HPC. Data-driven lexicon curation enabled a rule-based annotator to extract LE, followed by machine learning for improved positive predictive value (PPV). NLP results were analyzed longitudinally and then integrated and compared to a baseline statistical model predicting risk for a combined outcome (suicide death, suicide attempt and overdose).</p><p><strong>Results: </strong>First-time LE mentions showed a significant temporal correlation to suicide-related events (SRE) (suicide ideation, attempt and/or death) and are not associated with administrative bias. Predictive linear regression (LR) models integrating NLP-derived LE show an improved AUC of 0.81 and novel patient identification of up to 18%.</p><p><strong>Discussion: </strong>Our analysis shows that these methodologies helped improve performance metrics significantly from previous work, while outperforming related works. These results demonstrated that NLP-derived LE served as acute predictors for SRE.</p><p><strong>Conclusion: </strong>NLP integration into predictive models may help improve clinician decision support. Future work is necessary to better define and integrate these and other potential LE.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147476162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic review of foundation models for structured electronic health records. 结构化电子健康记录基础模型的系统评价。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-14 DOI: 10.1093/jamia/ocag033
Lin Lawrence Guo, Santiago Eduardo Arciniegas, Adam Paul Yan, Jason Fries, George A Tomlinson, Lillian Sung

Purpose: Foundation models pretrained on structured electronic health record (EHR) data promise improved predictive performance, sample efficiency and resilience to distribution shifts. However, model design, scale and use remain unclear. Objectives were to characterize foundation models pretrained on structured EHR data; examine temporal trends in model application and scale, architecture and design; and assess the extent to which publications omitted methodological details.

Methods: We searched MEDLINE and Embase (2018-October 2025) for foundation models pretrained on structured EHR data using self-supervised learning and applied to clinical prediction tasks. Study selection and data abstraction were performed in duplicate. Characteristics were summarized and stratified by median publication year.

Results: Fifty-three studies were included; publications increased over time. Most datasets (79%) originated from the United States. None pretrained exclusively on pediatric cohorts. Model architecture shifted towards transformers (P = .013) with longer context windows (P = .028), while application shifted from exclusively embedding-based toward generative or mixed use (P < .001). Choices regarding feature inclusion, temporal representation, self-supervised objective and downstream adaptation remained heterogeneous. Only 26% of studies evaluated transfer to external datasets, and none described clinical deployment. Key indicators of scale and compute were frequently unreported.

Conclusions: EHR foundation models are proliferating and increasingly transformer-based and generative. Yet methodological choices and reporting remain fragmented, indicating design trade-offs and best practices for EHR foundation models have not yet been established. None describe clinical deployment. Future work should clarify which design choices improve performance, robustness and transferability, increase reporting transparency and identify if they can be implemented to improve patient-important outcomes.

目的:对结构化电子健康记录(EHR)数据进行预训练的基础模型有望提高预测性能、样本效率和对分布变化的弹性。然而,模型设计、规模和使用仍不清楚。目的是表征结构化电子病历数据预训练的基础模型;检查模型应用和规模、架构和设计的时间趋势;并评估出版物遗漏方法学细节的程度。方法:我们检索MEDLINE和Embase(2018- 2025年10月),使用自监督学习对结构化电子病历数据进行预训练并应用于临床预测任务的基础模型。研究选择和数据提取分两份进行。按中位发表年份对特征进行总结和分层。结果:纳入53项研究;出版物随着时间的推移而增加。大多数数据集(79%)来自美国。没有专门针对儿科队列的预训练。模型架构转向具有更长的上下文窗口(P = .013)的转换器(P = .013)。028),而应用从完全基于嵌入转向生成或混合使用(P结论:EHR基础模型正在激增,并且越来越多地基于转换器和生成。然而,方法选择和报告仍然是碎片化的,这表明电子病历基础模型的设计权衡和最佳实践尚未建立。没有描述临床部署。未来的工作应该明确哪些设计选择可以提高性能、稳健性和可转移性,增加报告的透明度,并确定它们是否可以用于改善对患者重要的结果。
{"title":"Systematic review of foundation models for structured electronic health records.","authors":"Lin Lawrence Guo, Santiago Eduardo Arciniegas, Adam Paul Yan, Jason Fries, George A Tomlinson, Lillian Sung","doi":"10.1093/jamia/ocag033","DOIUrl":"https://doi.org/10.1093/jamia/ocag033","url":null,"abstract":"<p><strong>Purpose: </strong>Foundation models pretrained on structured electronic health record (EHR) data promise improved predictive performance, sample efficiency and resilience to distribution shifts. However, model design, scale and use remain unclear. Objectives were to characterize foundation models pretrained on structured EHR data; examine temporal trends in model application and scale, architecture and design; and assess the extent to which publications omitted methodological details.</p><p><strong>Methods: </strong>We searched MEDLINE and Embase (2018-October 2025) for foundation models pretrained on structured EHR data using self-supervised learning and applied to clinical prediction tasks. Study selection and data abstraction were performed in duplicate. Characteristics were summarized and stratified by median publication year.</p><p><strong>Results: </strong>Fifty-three studies were included; publications increased over time. Most datasets (79%) originated from the United States. None pretrained exclusively on pediatric cohorts. Model architecture shifted towards transformers (P = .013) with longer context windows (P = .028), while application shifted from exclusively embedding-based toward generative or mixed use (P < .001). Choices regarding feature inclusion, temporal representation, self-supervised objective and downstream adaptation remained heterogeneous. Only 26% of studies evaluated transfer to external datasets, and none described clinical deployment. Key indicators of scale and compute were frequently unreported.</p><p><strong>Conclusions: </strong>EHR foundation models are proliferating and increasingly transformer-based and generative. Yet methodological choices and reporting remain fragmented, indicating design trade-offs and best practices for EHR foundation models have not yet been established. None describe clinical deployment. Future work should clarify which design choices improve performance, robustness and transferability, increase reporting transparency and identify if they can be implemented to improve patient-important outcomes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147460871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring approaches to computational representation and classification of user-generated meal logs. 探索用户生成的膳食日志的计算表示和分类方法。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-14 DOI: 10.1093/jamia/ocaf200
Guanlan Hu, Adit Anand, Pooja M Desai, Iñigo Urteaga, Lena Mamykina

Objective: This study examined the use of machine learning (ML) and domain-specific enrichment in patient-generated health data, in the form of free-text meal logs, to classify meals on alignment with different nutritional goals.

Materials and methods: We used a dataset of over 3000 meal records collected by 114 individuals from a diverse, low-income community in a major US city using a mobile app. Registered dietitians (RDs) provided expert judgment for meal-goal alignment, used as the "gold-standard" for evaluation. Using text embeddings (TF-IDF and BERT) and domain-specific enrichment information (ontologies, ingredient parsers, and macronutrient contents) as inputs, we evaluated the performance of logistic regression and multilayer perceptron classifiers using accuracy, precision, recall, and F1 score against the gold standard and the individual's self-assessment.

Results: On average, individuals who logged meals achieved 0.576 accuracy of meal-goal alignment self-assessments. Even without enrichment, ML outperformed individual's self-assessments, with accuracies within 0.726-0.841 for different goals. The best-performing combination of ML classifier with enrichment achieved even higher accuracies (0.814-0.902). In general, ML classifiers with enrichment of parsed ingredients, food entities, and macronutrients information performed well across multiple nutritional goals, but there was variability in the impact of enrichment and classification algorithm on accuracy of classification for different nutritional goals.

Conclusion: ML can utilize unstructured free-text meal logs and reliably classify whether meals align with specific nutritional goals, exceeding individuals' self-assessments, especially when incorporating nutrition domain knowledge. Our findings highlight the potential of ML analysis of patient-generated health data to support patient-centered nutrition guidance in precision healthcare.

目的:本研究以自由文本膳食日志的形式检查了机器学习(ML)和特定领域富集在患者生成的健康数据中的使用,以根据不同的营养目标对膳食进行分类。材料和方法:我们使用了一个移动应用程序,收集了来自美国一个主要城市不同低收入社区的114个人的3000多份膳食记录的数据集。注册营养师(rd)提供了膳食目标一致性的专家判断,作为评估的“金标准”。使用文本嵌入(TF-IDF和BERT)和特定领域的富集信息(本体、成分解析器和宏量营养素含量)作为输入,我们使用准确率、精密度、召回率和F1分数来评估逻辑回归和多层感知器分类器的性能,以对照金标准和个人的自我评估。结果:平均而言,记录膳食的个体在膳食目标一致性自我评估方面达到了0.576的准确性。即使没有浓缩,ML也优于个体的自我评估,对于不同的目标,准确率在0.726-0.841之间。ML分类器与浓缩的最佳组合实现了更高的准确率(0.814-0.902)。总体而言,对已解析成分、食品实体和宏量营养素信息进行富集的ML分类器在多个营养目标上表现良好,但富集和分类算法对不同营养目标分类准确率的影响存在差异。结论:机器学习可以利用非结构化的自由文本膳食日志,并可靠地分类膳食是否符合特定的营养目标,超越个人的自我评估,特别是在结合营养领域知识时。我们的研究结果强调了ML分析患者生成的健康数据的潜力,以支持精准医疗中以患者为中心的营养指导。
{"title":"Exploring approaches to computational representation and classification of user-generated meal logs.","authors":"Guanlan Hu, Adit Anand, Pooja M Desai, Iñigo Urteaga, Lena Mamykina","doi":"10.1093/jamia/ocaf200","DOIUrl":"https://doi.org/10.1093/jamia/ocaf200","url":null,"abstract":"<p><strong>Objective: </strong>This study examined the use of machine learning (ML) and domain-specific enrichment in patient-generated health data, in the form of free-text meal logs, to classify meals on alignment with different nutritional goals.</p><p><strong>Materials and methods: </strong>We used a dataset of over 3000 meal records collected by 114 individuals from a diverse, low-income community in a major US city using a mobile app. Registered dietitians (RDs) provided expert judgment for meal-goal alignment, used as the \"gold-standard\" for evaluation. Using text embeddings (TF-IDF and BERT) and domain-specific enrichment information (ontologies, ingredient parsers, and macronutrient contents) as inputs, we evaluated the performance of logistic regression and multilayer perceptron classifiers using accuracy, precision, recall, and F1 score against the gold standard and the individual's self-assessment.</p><p><strong>Results: </strong>On average, individuals who logged meals achieved 0.576 accuracy of meal-goal alignment self-assessments. Even without enrichment, ML outperformed individual's self-assessments, with accuracies within 0.726-0.841 for different goals. The best-performing combination of ML classifier with enrichment achieved even higher accuracies (0.814-0.902). In general, ML classifiers with enrichment of parsed ingredients, food entities, and macronutrients information performed well across multiple nutritional goals, but there was variability in the impact of enrichment and classification algorithm on accuracy of classification for different nutritional goals.</p><p><strong>Conclusion: </strong>ML can utilize unstructured free-text meal logs and reliably classify whether meals align with specific nutritional goals, exceeding individuals' self-assessments, especially when incorporating nutrition domain knowledge. Our findings highlight the potential of ML analysis of patient-generated health data to support patient-centered nutrition guidance in precision healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147460813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accuracy of an XGBoost-based privacy preserving record linkage system compared with an electronic health record patient matching module in identifying patients shared between nearby academic health centers. 基于xgboost的隐私保护记录链接系统与电子健康记录患者匹配模块在识别附近学术医疗中心共享的患者方面的准确性比较。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-13 DOI: 10.1093/jamia/ocag020
Douglas S Bell, Tawny Saleh, Fernando Javier Sanz Vidorreta, Cenan N Pirani, Joshua M Pevnick, Robert A Jenders, Spencer L Soohoo

Objectives: Patients often receive health care from multiple organizations. Privacy Preserving Record Linkage (PPRL) is a technology for linking patient records without releasing personally identifiable information. We compared a commercial PPRL tool that uses the XGBoost machine learning algorithm with Care Everywhere (CE), a widely used rule-based patient linkage module.

Materials and methods: We matched the complete patient populations from Cedars-Sinai Health System and University of California, Los Angeles (UCLA) Health using the XGBoost PPRL tool at each of 3 score thresholds (98, 95, and 90), reflecting stricter vs more permissive matching. We compared PPRL matches with CE matches for the cohort of 849 157 patients who had been queried by CE from UCLA to Cedars-Sinai over 18 months. To classify proposed matches as false, uncertain or correct matches, 2 reviewers manually reviewed a random sample of 1200 patients representing each category of matches.

Results: Care Everywhere matched 18% of the cohort, whereas PPRL matched 9%, 27%, and 29% of the cohort using the 98, 95, and 90 thresholds, respectively. Projecting the false match rates from the manual review to the original populations, precision for CE was 99.6% (95% CI, 97.8%-100%). Precision for PPRL was 100% (95% CI, 99.2%-100%), 99.4% (95% CI, 97.4%-99.9%), and 98.7% (95% CI, 96.5%-99.4%) at the 3 thresholds, respectively. Using CE and PPRL matches together as a proxy gold standard, recall for CE was 61.5% (95% CI, 60.3%-61.9%) and for PPRL was 30.6% (95% CI, 30.3%-30.7%), 92.2% (95% CI, 90.2%-92.7%), and 96.8% (95% CI, 94.6%-97.5%) at each threshold, respectively.

Conclusions: The precision and recall of PPRL matching differed substantially across the available match thresholds. Compared with the rule-based system, PPRL at the 95 threshold had 50% higher recall with similar precision. Privacy Preserving Record Linkage holds promise for improving research, but users must choose the precision vs recall needed for their application.

目的:患者通常从多个组织接受医疗保健。隐私保护记录链接(PPRL)是一种不泄露个人身份信息的患者记录链接技术。我们将使用XGBoost机器学习算法的商业PPRL工具与广泛使用的基于规则的患者链接模块Care Everywhere (CE)进行了比较。材料和方法:我们使用XGBoost PPRL工具对雪松-西奈医疗系统和加州大学洛杉矶分校(UCLA)健康中心的完整患者群体进行匹配,在3个评分阈值(98、95和90)中的每一个进行匹配,反映更严格的匹配与更宽松的匹配。我们比较了从UCLA到Cedars-Sinai在18个月内接受CE查询的849157例患者的PPRL匹配与CE匹配。为了将建议的匹配分类为错误、不确定或正确的匹配,2名审稿人手动审查了代表每种匹配类别的1200名患者的随机样本。结果:Care Everywhere匹配了18%的队列,而PPRL匹配了9%、27%和29%的队列,分别使用了98、95和90阈值。将人工评审的错误匹配率预测到原始人群,CE的精度为99.6% (95% CI, 97.8%-100%)。在3个阈值下,PPRL的精密度分别为100% (95% CI, 99.2%-100%)、99.4% (95% CI, 97.4%-99.9%)和98.7% (95% CI, 96.5%-99.4%)。使用CE和PPRL匹配一起作为代理金标准,CE的召回率分别为61.5% (95% CI, 60.3%-61.9%), PPRL的召回率分别为30.6% (95% CI, 30.3%-30.7%), 92.2% (95% CI, 90.2%-92.7%)和96.8% (95% CI, 94.6%-97.5%)。结论:PPRL匹配的准确率和召回率在不同的匹配阈值上存在显著差异。与基于规则的系统相比,PPRL在95阈值下的召回率提高了50%。隐私保护记录链接有望改善研究,但用户必须选择他们的应用程序所需的精度和召回率。
{"title":"Accuracy of an XGBoost-based privacy preserving record linkage system compared with an electronic health record patient matching module in identifying patients shared between nearby academic health centers.","authors":"Douglas S Bell, Tawny Saleh, Fernando Javier Sanz Vidorreta, Cenan N Pirani, Joshua M Pevnick, Robert A Jenders, Spencer L Soohoo","doi":"10.1093/jamia/ocag020","DOIUrl":"https://doi.org/10.1093/jamia/ocag020","url":null,"abstract":"<p><strong>Objectives: </strong>Patients often receive health care from multiple organizations. Privacy Preserving Record Linkage (PPRL) is a technology for linking patient records without releasing personally identifiable information. We compared a commercial PPRL tool that uses the XGBoost machine learning algorithm with Care Everywhere (CE), a widely used rule-based patient linkage module.</p><p><strong>Materials and methods: </strong>We matched the complete patient populations from Cedars-Sinai Health System and University of California, Los Angeles (UCLA) Health using the XGBoost PPRL tool at each of 3 score thresholds (98, 95, and 90), reflecting stricter vs more permissive matching. We compared PPRL matches with CE matches for the cohort of 849 157 patients who had been queried by CE from UCLA to Cedars-Sinai over 18 months. To classify proposed matches as false, uncertain or correct matches, 2 reviewers manually reviewed a random sample of 1200 patients representing each category of matches.</p><p><strong>Results: </strong>Care Everywhere matched 18% of the cohort, whereas PPRL matched 9%, 27%, and 29% of the cohort using the 98, 95, and 90 thresholds, respectively. Projecting the false match rates from the manual review to the original populations, precision for CE was 99.6% (95% CI, 97.8%-100%). Precision for PPRL was 100% (95% CI, 99.2%-100%), 99.4% (95% CI, 97.4%-99.9%), and 98.7% (95% CI, 96.5%-99.4%) at the 3 thresholds, respectively. Using CE and PPRL matches together as a proxy gold standard, recall for CE was 61.5% (95% CI, 60.3%-61.9%) and for PPRL was 30.6% (95% CI, 30.3%-30.7%), 92.2% (95% CI, 90.2%-92.7%), and 96.8% (95% CI, 94.6%-97.5%) at each threshold, respectively.</p><p><strong>Conclusions: </strong>The precision and recall of PPRL matching differed substantially across the available match thresholds. Compared with the rule-based system, PPRL at the 95 threshold had 50% higher recall with similar precision. Privacy Preserving Record Linkage holds promise for improving research, but users must choose the precision vs recall needed for their application.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147482119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current methods for analyzing time-series patient-generated health data to assess treatment response: a scoping review. 分析时间序列患者产生的健康数据以评估治疗反应的当前方法:范围审查。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-11 DOI: 10.1093/jamia/ocag027
Michelo Banda, Sian Bladon, Mariam Al-Attar, Roberto Cahuantzi, David A Jenkins, William G Dixon, Sabine N van der Veer

Objectives: We aimed to identify and map recent studies using high-frequency, time-series electronic patient-generated health data (ePGHD) to assess treatment response; characterize ePGHD types and collection methods; summarize ePGHD-based definitions of treatment response; and describe analytical approaches used.

Materials and methods: We systematically searched 4 databases for articles published between January 2022 and June 2024, supplemented by a forward citation search until June 2025. Peer-reviewed studies were eligible if ePGHD were collected outside clinical settings, and either reported at least weekly (ie, if actively reported by participants) or summarized discretely (eg, daily) if passively collected via wearables/sensors. We screened articles for eligibility independently in duplicate and synthesized extracted data descriptively.

Results: Our search yielded 4030 articles, of which we included 186. Most studies collected ePGHD using mobile applications or webforms (n = 133) over 4-12 weeks (n = 67). Prior to analysis, 132 studies excluded portions or condensed ePGHD into one or more summaries. Among 172 studies estimating treatment response, 98 applied longitudinal methods (eg, mixed-effects models) that accounted for repeated measures while capturing within- and between-subject variations, whereas 74 used cross-sectional approaches. Of 18 prediction modeling studies, 16 employed machine learning techniques, with only 4 explicitly modeling repeated measures. Five studies identified clusters of response trajectories generally without incorporating temporal dependencies (eg, using K-means).

Discussion and conclusion: Many studies in this review did not fully leverage the high-frequency, longitudinal nature of ePGHD. Future research should adopt more appropriate and readily available analytic methods to maximize the potential of time-series ePGHD for generating insights into treatment response.

目的:我们旨在识别和绘制最近使用高频、时间序列电子患者生成健康数据(ePGHD)来评估治疗反应的研究;描述ePGHD的类型和收集方法;总结基于epghd的治疗反应定义;描述使用的分析方法。材料和方法:我们系统检索了4个数据库,检索了2022年1月至2024年6月期间发表的文章,并辅以转发引文检索,检索时间截止到2025年6月。如果ePGHD是在临床环境之外收集的,并且至少每周报告一次(即,如果参与者主动报告),或者通过可穿戴设备/传感器被动收集的离散总结(例如,每天),则同行评议的研究符合条件。我们独立筛选了重复的文章,并描述性地合成了提取的数据。结果:我们检索了4030篇文章,其中我们收录了186篇。大多数研究在4-12周(n = 67)内使用移动应用程序或网络表单收集ePGHD (n = 133)。在分析之前,132项研究排除了部分ePGHD或将其浓缩为一个或多个摘要。在估计治疗反应的172项研究中,98项采用纵向方法(例如,混合效应模型),在捕获受试者内部和受试者之间变化的同时考虑重复测量,而74项采用横断面方法。在18项预测建模研究中,16项使用了机器学习技术,只有4项明确建模重复测量。五项研究确定了反应轨迹的簇,通常没有纳入时间依赖性(例如,使用K-means)。讨论与结论:本综述中的许多研究没有充分利用ePGHD的高频性和纵向性。未来的研究应该采用更合适和更容易获得的分析方法,以最大限度地发挥时间序列ePGHD的潜力,以产生对治疗反应的见解。
{"title":"Current methods for analyzing time-series patient-generated health data to assess treatment response: a scoping review.","authors":"Michelo Banda, Sian Bladon, Mariam Al-Attar, Roberto Cahuantzi, David A Jenkins, William G Dixon, Sabine N van der Veer","doi":"10.1093/jamia/ocag027","DOIUrl":"https://doi.org/10.1093/jamia/ocag027","url":null,"abstract":"<p><strong>Objectives: </strong>We aimed to identify and map recent studies using high-frequency, time-series electronic patient-generated health data (ePGHD) to assess treatment response; characterize ePGHD types and collection methods; summarize ePGHD-based definitions of treatment response; and describe analytical approaches used.</p><p><strong>Materials and methods: </strong>We systematically searched 4 databases for articles published between January 2022 and June 2024, supplemented by a forward citation search until June 2025. Peer-reviewed studies were eligible if ePGHD were collected outside clinical settings, and either reported at least weekly (ie, if actively reported by participants) or summarized discretely (eg, daily) if passively collected via wearables/sensors. We screened articles for eligibility independently in duplicate and synthesized extracted data descriptively.</p><p><strong>Results: </strong>Our search yielded 4030 articles, of which we included 186. Most studies collected ePGHD using mobile applications or webforms (n = 133) over 4-12 weeks (n = 67). Prior to analysis, 132 studies excluded portions or condensed ePGHD into one or more summaries. Among 172 studies estimating treatment response, 98 applied longitudinal methods (eg, mixed-effects models) that accounted for repeated measures while capturing within- and between-subject variations, whereas 74 used cross-sectional approaches. Of 18 prediction modeling studies, 16 employed machine learning techniques, with only 4 explicitly modeling repeated measures. Five studies identified clusters of response trajectories generally without incorporating temporal dependencies (eg, using K-means).</p><p><strong>Discussion and conclusion: </strong>Many studies in this review did not fully leverage the high-frequency, longitudinal nature of ePGHD. Future research should adopt more appropriate and readily available analytic methods to maximize the potential of time-series ePGHD for generating insights into treatment response.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147437027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MedRep: medical concept representations for general electronic health record foundation models. MedRep:通用电子病历基础模型的医学概念表示。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-10 DOI: 10.1093/jamia/ocag032
Junmo Kim, Namkyeong Lee, Jiwon Kim, Kwangsoo Kim

Objective: Traditional electronic health record (EHR) foundation models fail to process unseen medical codes, limiting generalizability across institutions with different vocabularies. To address this problem, we introduce medical concept representation (MedRep), standardized medical concept representations for EHR foundation models, enabling recognition of semantically similar concepts regardless of their specific IDs.

Materials and methods: We utilized Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) vocabulary covering 7.5 million concepts from 66 medical vocabularies. MedRep integrates large language model-generated concept descriptions and OMOP graph ontology using graph contrastive learning with knowledge distillation. We evaluated MedRep-based models on MIMIC-IV (internal validation) and EHRSHOT (external validation) across 9 prediction tasks including clinical outcomes, phenotypes, and in-hospital events.

Results: MedRep consistently outperformed baseline models, particularly in external validation with average improvements of 0.088 in area under the receiver operating characteristic curve and 0.208 in area under the precision-recall curve. Qualitative analysis demonstrated that MedRep-based models identified more clinically relevant concepts when making decisions than the baseline models. Performance improvements remained stable across diverse EHR foundation model architectures, including BEHRT, Med-BERT, and CDM-BERT.

Discussion: MedRep improves the generalizability of EHR foundation models by encouraging similar concepts to have similar representations. EHR foundation models developed at different institutions could cooperate through MedRep, merging knowledge from multiple hospital datasets. In addition, our approach could reduce healthcare disparities by enabling smaller institutions to benefit from models trained on larger datasets.

Conclusion: MedRep improves EHR foundation model performance, interpretability, and generalizability, serving as a standard baseline representation for EHR foundation models adopting OMOP CDM.

目的:传统的电子健康记录(EHR)基础模型无法处理未见过的医疗代码,限制了不同词汇机构之间的通用性。为了解决这个问题,我们引入了医学概念表示(MedRep),这是EHR基础模型的标准化医学概念表示,允许识别语义上相似的概念,而不管它们的特定id。材料和方法:我们利用观察性医疗结果伙伴关系(OMOP)公共数据模型(CDM)词汇表,涵盖66个医学词汇中的750万个概念。MedRep将大型语言模型生成的概念描述和OMOP图本体结合起来,使用图对比学习和知识蒸馏。我们在MIMIC-IV(内部验证)和EHRSHOT(外部验证)上评估了基于medrep的模型,涉及9个预测任务,包括临床结果、表型和院内事件。结果:MedRep持续优于基线模型,特别是在外部验证中,受试者工作特征曲线下面积平均提高0.088,精密度-召回率曲线下面积平均提高0.208。定性分析表明,与基线模型相比,基于medrep的模型在决策时识别出更多与临床相关的概念。性能改进在不同的EHR基础模型体系结构中保持稳定,包括BEHRT、Med-BERT和CDM-BERT。讨论:MedRep通过鼓励相似的概念具有相似的表示来提高EHR基础模型的泛化性。不同机构开发的EHR基础模型可以通过MedRep进行合作,合并来自多个医院数据集的知识。此外,我们的方法可以通过使较小的机构受益于在较大数据集上训练的模型来减少医疗保健差距。结论:MedRep提高了EHR基础模型的性能、可解释性和泛化性,可作为采用OMOP CDM的EHR基础模型的标准基线表示。
{"title":"MedRep: medical concept representations for general electronic health record foundation models.","authors":"Junmo Kim, Namkyeong Lee, Jiwon Kim, Kwangsoo Kim","doi":"10.1093/jamia/ocag032","DOIUrl":"https://doi.org/10.1093/jamia/ocag032","url":null,"abstract":"<p><strong>Objective: </strong>Traditional electronic health record (EHR) foundation models fail to process unseen medical codes, limiting generalizability across institutions with different vocabularies. To address this problem, we introduce medical concept representation (MedRep), standardized medical concept representations for EHR foundation models, enabling recognition of semantically similar concepts regardless of their specific IDs.</p><p><strong>Materials and methods: </strong>We utilized Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) vocabulary covering 7.5 million concepts from 66 medical vocabularies. MedRep integrates large language model-generated concept descriptions and OMOP graph ontology using graph contrastive learning with knowledge distillation. We evaluated MedRep-based models on MIMIC-IV (internal validation) and EHRSHOT (external validation) across 9 prediction tasks including clinical outcomes, phenotypes, and in-hospital events.</p><p><strong>Results: </strong>MedRep consistently outperformed baseline models, particularly in external validation with average improvements of 0.088 in area under the receiver operating characteristic curve and 0.208 in area under the precision-recall curve. Qualitative analysis demonstrated that MedRep-based models identified more clinically relevant concepts when making decisions than the baseline models. Performance improvements remained stable across diverse EHR foundation model architectures, including BEHRT, Med-BERT, and CDM-BERT.</p><p><strong>Discussion: </strong>MedRep improves the generalizability of EHR foundation models by encouraging similar concepts to have similar representations. EHR foundation models developed at different institutions could cooperate through MedRep, merging knowledge from multiple hospital datasets. In addition, our approach could reduce healthcare disparities by enabling smaller institutions to benefit from models trained on larger datasets.</p><p><strong>Conclusion: </strong>MedRep improves EHR foundation model performance, interpretability, and generalizability, serving as a standard baseline representation for EHR foundation models adopting OMOP CDM.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147436940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization. 用于文献筛选的交互式主动学习:使用DeepSeek推理对GPT进行微调以进行跨域泛化。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-09 DOI: 10.1093/jamia/ocag014
Yiming Li, Joseph M Plasek, Xinsong Du, Yifei Wang, Zhengyang Zhou, John Lian, Ya-Wen Chuang, Pengyu Hong, Peter C Hou, Li Zhou

Objective: Automated literature screening in biomedical research is often hindered by domain shifts and scarcity of labeled data, which limit model accuracy and generalizability. While large language models (LLMs) perform well in zero-shot settings, they often fail to capture complex, domain-specific reasoning patterns. To address this limitation, this study investigates whether an interactive, weakly supervised learning framework combining GPT (generative pre-trained transformer)'s fine-tuning adaptability with DeepSeek's reasoning capabilities can improve literature screening performance across biomedical domains.

Materials and methods: We developed an active learning framework that leverages model disagreement between GPT-4o and DeepSeek to improve literature screening performance. This process began with a labeled corpus of 6331 articles on large language models, from which a model disagreement analysis was performed to identify cases where GPT-4o misclassified and DeepSeek produced correct predictions. Three GPT variants-GPT-4o, GPT-4o-mini, and GPT-4.1-nano, were fine-tuned under standard supervised learning settings using these disagreement-based samples. Fine-tuning prompts incorporated classification labels and, when available, rationale traces generated by DeepSeek to provide reasoning-augmented weak supervision. Model performance was evaluated on an independent benchmark set of 291 annotated articles across 10 topic queries in cancer immunotherapy and LLMs in medicine, using standard evaluation metrics, with recall as the primary measure.

Results: Fine-tuning GPT models using disagreement-based examples significantly improved performance. GPT-4o-mini achieved the best overall results after fine-tuning, especially with the highest F1 score (0.93, P < .001) and recall (0.95, P < .001). Across the biomedical topics, fine-tuned models consistently outperformed their zero-shot counterparts without increasing reviewer workload.

Discussion: These findings demonstrate the effectiveness of disagreement-driven active learning in enhancing GPT-based biomedical literature screening. Lightweight models like GPT-4o-mini benefit most from targeted, reasoning-enriched training, highlighting their suitability for scalable deployment.

Conclusion: This study introduces an interactive active learning framework that leverages fine-tuned LLMs with reasoning capabilities to enhance literature screening. The approach offers a scalable solution to more efficient and reliable information retrieval in systematic reviews.

目的:生物医学研究中的自动文献筛选经常受到领域转移和标记数据稀缺的阻碍,这限制了模型的准确性和泛化性。虽然大型语言模型(llm)在零射击设置中表现良好,但它们经常无法捕获复杂的、特定于领域的推理模式。为了解决这一限制,本研究探讨了结合GPT(生成预训练变压器)的微调适应性和DeepSeek的推理能力的交互式弱监督学习框架是否可以提高生物医学领域的文献筛选性能。材料和方法:我们开发了一个主动学习框架,利用gpt - 40和DeepSeek之间的模型分歧来提高文献筛选性能。这个过程从一个大型语言模型上的6331篇文章的标记语料库开始,从中进行模型分歧分析,以确定gpt - 40错误分类和DeepSeek产生正确预测的情况。三个GPT变体——GPT- 40、GPT- 40 -mini和GPT-4.1-nano,在标准监督学习设置下使用这些基于分歧的样本进行微调。微调提示包含分类标签,并且在可用的情况下,由DeepSeek生成的基本原理跟踪可以提供推理增强的弱监督。使用标准评估指标,以召回率为主要衡量标准,在癌症免疫治疗和医学法学硕士的10个主题查询的291篇带注释的文章的独立基准集上评估模型的性能。结果:使用基于分歧的示例微调GPT模型显着提高了性能。gpt - 40 -mini在微调后取得了最好的综合成绩,特别是F1得分(0.93,P < .001)和召回率(0.95,P < .001)最高。在整个生物医学主题中,微调模型在不增加审稿人工作量的情况下始终优于零射击模型。讨论:这些发现证明了分歧驱动的主动学习在增强基于gpt的生物医学文献筛选中的有效性。像gpt - 40 -mini这样的轻量级模型从有针对性的、推理丰富的培训中获益最多,突出了它们适合可扩展部署的适用性。结论:本研究引入了一个互动的主动学习框架,该框架利用具有推理能力的微调法学硕士来增强文献筛选。该方法提供了一种可扩展的解决方案,使系统综述的信息检索更加有效和可靠。
{"title":"Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization.","authors":"Yiming Li, Joseph M Plasek, Xinsong Du, Yifei Wang, Zhengyang Zhou, John Lian, Ya-Wen Chuang, Pengyu Hong, Peter C Hou, Li Zhou","doi":"10.1093/jamia/ocag014","DOIUrl":"https://doi.org/10.1093/jamia/ocag014","url":null,"abstract":"<p><strong>Objective: </strong>Automated literature screening in biomedical research is often hindered by domain shifts and scarcity of labeled data, which limit model accuracy and generalizability. While large language models (LLMs) perform well in zero-shot settings, they often fail to capture complex, domain-specific reasoning patterns. To address this limitation, this study investigates whether an interactive, weakly supervised learning framework combining GPT (generative pre-trained transformer)'s fine-tuning adaptability with DeepSeek's reasoning capabilities can improve literature screening performance across biomedical domains.</p><p><strong>Materials and methods: </strong>We developed an active learning framework that leverages model disagreement between GPT-4o and DeepSeek to improve literature screening performance. This process began with a labeled corpus of 6331 articles on large language models, from which a model disagreement analysis was performed to identify cases where GPT-4o misclassified and DeepSeek produced correct predictions. Three GPT variants-GPT-4o, GPT-4o-mini, and GPT-4.1-nano, were fine-tuned under standard supervised learning settings using these disagreement-based samples. Fine-tuning prompts incorporated classification labels and, when available, rationale traces generated by DeepSeek to provide reasoning-augmented weak supervision. Model performance was evaluated on an independent benchmark set of 291 annotated articles across 10 topic queries in cancer immunotherapy and LLMs in medicine, using standard evaluation metrics, with recall as the primary measure.</p><p><strong>Results: </strong>Fine-tuning GPT models using disagreement-based examples significantly improved performance. GPT-4o-mini achieved the best overall results after fine-tuning, especially with the highest F1 score (0.93, P < .001) and recall (0.95, P < .001). Across the biomedical topics, fine-tuned models consistently outperformed their zero-shot counterparts without increasing reviewer workload.</p><p><strong>Discussion: </strong>These findings demonstrate the effectiveness of disagreement-driven active learning in enhancing GPT-based biomedical literature screening. Lightweight models like GPT-4o-mini benefit most from targeted, reasoning-enriched training, highlighting their suitability for scalable deployment.</p><p><strong>Conclusion: </strong>This study introduces an interactive active learning framework that leverages fine-tuned LLMs with reasoning capabilities to enhance literature screening. The approach offers a scalable solution to more efficient and reliable information retrieval in systematic reviews.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fairness aware subset selection for advancing equity in skin cancer detection. 公平性感知子集选择促进皮肤癌检测公平性。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-09 DOI: 10.1093/jamia/ocag028
Yehuda Perry, Abdulaziz A Almuzaini, Adewole S Adamson, Bahar Dasgeb, David J Foran, Vivek K Singh

Objectives: Skin cancer is the most common malignancy in the United States, with more than five million cases diagnosed annually among 3.3 million individuals. Melanoma, the deadliest form of skin cancer, accounts for roughly 200 000 new diagnoses each year and nearly 10 000 deaths. AI-based skin cancer detection is being developed and tested in laboratory and academic settings as a promising approach to improve access and reduce disparities. However, current models often underperform on darker skin tones (Fitzpatrick Types V and VI), creating fairness concerns that must be addressed prior to clinical deployment. Existing fairness-aware methods focus on algorithmic adjustments while neglecting data quality and representation. We introduce FAIR-SCAN (Fairness and Accuracy through Ranking-Based Subset Selection for Skin Cancer Detection), a data-centric framework that enhances fairness through subset selection guided by marginal contribution score (MCS) estimation.

Materials and methods: FAIR-SCAN ranks data points by their contribution to both accuracy and fairness, then selects an optimal subset for training. We evaluated its effectiveness using images from Diverse Dermatology Images (DDI) and Fitzpatrick 17K.

Results: FAIR-SCAN improved balance in accuracy, True Positive Rate, and False Positive Rate across skin tones while reducing the training dataset by 50%, outperforming algorithm-focused fairness methods.

Discussion: These findings highlight the importance of strategic data selection in mitigating bias in AI-driven diagnostics. FAIR-SCAN's data-centric approach enhances both precision and equity in skin cancer detection.

Conclusion: Strategic data selection is critical for equitable AI-driven diagnostics. FAIR-SCAN advances fairness and accuracy in skin cancer detection, supporting development of trustworthy clinical AI systems.

目的:皮肤癌是美国最常见的恶性肿瘤,每年在330万人中诊断出超过500万例。黑色素瘤是最致命的一种皮肤癌,每年约有20万例新诊断病例,近1万人死亡。基于人工智能的皮肤癌检测正在实验室和学术环境中开发和测试,作为改善可及性和减少差距的一种有希望的方法。然而,目前的模型通常在较深肤色(Fitzpatrick V型和VI型)上表现不佳,因此在临床部署之前必须解决公平性问题。现有的公平感知方法侧重于算法调整,而忽略了数据质量和表示。我们介绍了FAIR-SCAN(通过基于排名的子集选择进行皮肤癌检测的公平性和准确性),这是一个以数据为中心的框架,通过边际贡献分数(MCS)估计指导的子集选择来增强公平性。材料和方法:FAIR-SCAN根据对准确性和公平性的贡献对数据点进行排名,然后选择一个最优子集进行训练。我们使用来自多样化皮肤病学图像(DDI)和Fitzpatrick 17K的图像来评估其有效性。结果:FAIR-SCAN提高了准确率、真阳性率和假阳性率的平衡,同时将训练数据集减少了50%,优于以算法为中心的公平性方法。讨论:这些发现强调了在人工智能驱动的诊断中,战略性数据选择在减轻偏倚方面的重要性。FAIR-SCAN以数据为中心的方法提高了皮肤癌检测的准确性和公平性。结论:战略性数据选择对于公平的人工智能驱动诊断至关重要。FAIR-SCAN提高了皮肤癌检测的公平性和准确性,支持开发值得信赖的临床人工智能系统。
{"title":"Fairness aware subset selection for advancing equity in skin cancer detection.","authors":"Yehuda Perry, Abdulaziz A Almuzaini, Adewole S Adamson, Bahar Dasgeb, David J Foran, Vivek K Singh","doi":"10.1093/jamia/ocag028","DOIUrl":"https://doi.org/10.1093/jamia/ocag028","url":null,"abstract":"<p><strong>Objectives: </strong>Skin cancer is the most common malignancy in the United States, with more than five million cases diagnosed annually among 3.3 million individuals. Melanoma, the deadliest form of skin cancer, accounts for roughly 200 000 new diagnoses each year and nearly 10 000 deaths. AI-based skin cancer detection is being developed and tested in laboratory and academic settings as a promising approach to improve access and reduce disparities. However, current models often underperform on darker skin tones (Fitzpatrick Types V and VI), creating fairness concerns that must be addressed prior to clinical deployment. Existing fairness-aware methods focus on algorithmic adjustments while neglecting data quality and representation. We introduce FAIR-SCAN (Fairness and Accuracy through Ranking-Based Subset Selection for Skin Cancer Detection), a data-centric framework that enhances fairness through subset selection guided by marginal contribution score (MCS) estimation.</p><p><strong>Materials and methods: </strong>FAIR-SCAN ranks data points by their contribution to both accuracy and fairness, then selects an optimal subset for training. We evaluated its effectiveness using images from Diverse Dermatology Images (DDI) and Fitzpatrick 17K.</p><p><strong>Results: </strong>FAIR-SCAN improved balance in accuracy, True Positive Rate, and False Positive Rate across skin tones while reducing the training dataset by 50%, outperforming algorithm-focused fairness methods.</p><p><strong>Discussion: </strong>These findings highlight the importance of strategic data selection in mitigating bias in AI-driven diagnostics. FAIR-SCAN's data-centric approach enhances both precision and equity in skin cancer detection.</p><p><strong>Conclusion: </strong>Strategic data selection is critical for equitable AI-driven diagnostics. FAIR-SCAN advances fairness and accuracy in skin cancer detection, supporting development of trustworthy clinical AI systems.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1