Pub Date : 2025-10-23eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf113
Cong Liu, Katherine D Crew, Jennifer Morse, Jodell E Linder, Antonis C Antoniou, Tim Carver, Josh Cortopassi, Josh F Peterson, Casey N Ta, Christin Hoell, Cynthia Prows, Eimear E Kenny, Emily Miller, Emma Perez, Gail P Jarvik, Harris T Bland, Jacqueline A Odgis, Kathleen F Mittendorf, Katherine E Bonini, Kyle McGuffin, Leah C Kottyan, Mary Maradik, Nita Limdi, Noura S Abul-Husn, Priya N Marathe, Sabrina A Suckiel, Sienna Aguilar, Toni J Lewis, Wei-Qi Wei, Yuan Luo, Robert R Freimuth, Hakon Hakonarson, Chunhua Weng, Wendy K Chung, Georgia L Wiesner
Objectives: To implementation an automated multi-institutional pipeline that delivers breast-cancer risk integrated with polygenic risk scores, monogenic variants, family history, and clinical factors, emphasizing operational challenges and their solutions.
Materials and methods: A five-stage process was executed at ten sites. Data streams from REDCap surveys, PRS and monogenic reports, and MeTree pedigrees were normalized and forwarded through a REDCap plug-in to the CanRisk API.
Results: Integrated risk was returned to >10 000 women; 3.6% were ≥25 % lifetime risk and 0.9% carried pathogenic variants. Pipeline generated score aligns well with manual generated ones. Major barriers such as heterogeneous pedigree formats, missing data, edge-case handling, and evolving model versions were identified and resolved through mapping rules, imputations, and iterative testing.
Discussion: Cross-platform data harmonization and stakeholder alignment were decisive for success. Borderline-risk communication and model-version drift remain open issues.
Conclusion: Large-scale PRS-integrated breast-cancer risk reporting is feasible but requires robust interoperability standards and iterative governance.
{"title":"Implementing integrated genomic risk assessments for breast cancer: lessons learned from the Electronic Medical Records and Genomics study.","authors":"Cong Liu, Katherine D Crew, Jennifer Morse, Jodell E Linder, Antonis C Antoniou, Tim Carver, Josh Cortopassi, Josh F Peterson, Casey N Ta, Christin Hoell, Cynthia Prows, Eimear E Kenny, Emily Miller, Emma Perez, Gail P Jarvik, Harris T Bland, Jacqueline A Odgis, Kathleen F Mittendorf, Katherine E Bonini, Kyle McGuffin, Leah C Kottyan, Mary Maradik, Nita Limdi, Noura S Abul-Husn, Priya N Marathe, Sabrina A Suckiel, Sienna Aguilar, Toni J Lewis, Wei-Qi Wei, Yuan Luo, Robert R Freimuth, Hakon Hakonarson, Chunhua Weng, Wendy K Chung, Georgia L Wiesner","doi":"10.1093/jamiaopen/ooaf113","DOIUrl":"10.1093/jamiaopen/ooaf113","url":null,"abstract":"<p><strong>Objectives: </strong>To implementation an automated multi-institutional pipeline that delivers breast-cancer risk integrated with polygenic risk scores, monogenic variants, family history, and clinical factors, emphasizing operational challenges and their solutions.</p><p><strong>Materials and methods: </strong>A five-stage process was executed at ten sites. Data streams from REDCap surveys, PRS and monogenic reports, and MeTree pedigrees were normalized and forwarded through a REDCap plug-in to the CanRisk API.</p><p><strong>Results: </strong>Integrated risk was returned to >10 000 women; 3.6% were ≥25 % lifetime risk and 0.9% carried pathogenic variants. Pipeline generated score aligns well with manual generated ones. Major barriers such as heterogeneous pedigree formats, missing data, edge-case handling, and evolving model versions were identified and resolved through mapping rules, imputations, and iterative testing.</p><p><strong>Discussion: </strong>Cross-platform data harmonization and stakeholder alignment were decisive for success. Borderline-risk communication and model-version drift remain open issues.</p><p><strong>Conclusion: </strong>Large-scale PRS-integrated breast-cancer risk reporting is feasible but requires robust interoperability standards and iterative governance.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf113"},"PeriodicalIF":3.4,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12552095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf119
Alexandre Niset, Ines Melot, Margaux Pireau, Alexandre Englebert, Nathan Scius, Julien Flament, Salim El Hadwe, Mejdeddine Al Barajraji, Henri Thonon, Sami Barrit
Objective: To evaluate predictive diagnostic performance of open- and closed-source large language models (LLMs) in emergency medicine, addressing the urgent need for innovative clinical decision support tools amid rising patient volumes and staffing shortages.
Materials and methods: We generated 2370 AI-driven diagnostic predictions (Top-5 diagnoses from each of 6 model pipelines per patient), using data from 79 real-world emergency department cases collected consecutively during a 24-hour peak influx period at a tertiary care center. Pipelines combined open- and closed-source embedding models (text-embedding-ada-002, MXBAI) with foundational models (GPT-4, Llama3, and Qwen2) grounded via retrieval-augmented generation using emergency medicine textbooks. Models' predictions were assessed against reference diagnoses established by expert consensus.
Results: All pipelines achieved comparable diagnostic match rates (62.03%-72.15%). Diagnostic performance was significantly influenced by case characteristics: match rates were notably higher for specific versus unspecific diagnoses (85.53% vs 31.41%, P < .001) and surgical versus medical cases (79.49% vs 56.25%, P < .001). Open-source models demonstrated markedly superior sourcing capabilities compared to GPT-4-based combinations (P < 1.4e-12), with MBXAI/Qwen2 pipeline achieving perfect citation verification.
Discussion: Diagnostic accuracy primarily depended on case characteristics rather than the choice of model pipeline, highlighting fundamental AI alignment challenges in clinical reasoning. Low performance in unspecific diagnoses underscores inherent complexities in clinical definitions rather than technological shortcomings alone.
Conclusion: Open-source LLM pipelines provide enhanced sourcing capabilities, crucial for transparent clinical decision-making and interpretability. Further research should expand knowledge bases to include hospital guidelines and regional epidemiology, while exploring on-premises solutions to better align with privacy regulations and clinical integration.
目的:评估开源和闭源大型语言模型(LLMs)在急诊医学中的预测诊断性能,以解决在患者数量增加和人员短缺的情况下对创新临床决策支持工具的迫切需求。材料和方法:我们生成了2370个人工智能驱动的诊断预测(每个患者6个模型管道中的前5个诊断),使用了在三级医疗中心24小时高峰涌入期间连续收集的79个真实急诊科病例的数据。管道结合了开源和闭源嵌入模型(文本嵌入-ada-002, MXBAI)和基础模型(GPT-4, Llama3和Qwen2),通过检索增强生成基于急诊医学教科书。根据专家共识建立的参考诊断评估模型的预测。结果:各管道诊断符合率均达到相当水平(62.03% ~ 72.15%)。诊断表现受病例特征的显著影响:特异性诊断的匹配率明显高于非特异性诊断(85.53% vs 31.41%)。讨论:诊断准确性主要取决于病例特征,而不是模型管道的选择,这凸显了临床推理中基本的人工智能对齐挑战。在非特异性诊断中的低表现强调了临床定义的内在复杂性,而不仅仅是技术缺陷。结论:开源LLM管道提供了增强的采购能力,对透明的临床决策和可解释性至关重要。进一步的研究应扩大知识库,包括医院指南和区域流行病学,同时探索本地解决方案,以更好地配合隐私法规和临床整合。
{"title":"Grounded large language models for diagnostic prediction in real-world emergency department settings.","authors":"Alexandre Niset, Ines Melot, Margaux Pireau, Alexandre Englebert, Nathan Scius, Julien Flament, Salim El Hadwe, Mejdeddine Al Barajraji, Henri Thonon, Sami Barrit","doi":"10.1093/jamiaopen/ooaf119","DOIUrl":"10.1093/jamiaopen/ooaf119","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate predictive diagnostic performance of open- and closed-source large language models (LLMs) in emergency medicine, addressing the urgent need for innovative clinical decision support tools amid rising patient volumes and staffing shortages.</p><p><strong>Materials and methods: </strong>We generated 2370 AI-driven diagnostic predictions (Top-5 diagnoses from each of 6 model pipelines per patient), using data from 79 real-world emergency department cases collected consecutively during a 24-hour peak influx period at a tertiary care center. Pipelines combined open- and closed-source embedding models (text-embedding-ada-002, MXBAI) with foundational models (GPT-4, Llama3, and Qwen2) grounded via retrieval-augmented generation using emergency medicine textbooks. Models' predictions were assessed against reference diagnoses established by expert consensus.</p><p><strong>Results: </strong>All pipelines achieved comparable diagnostic match rates (62.03%-72.15%). Diagnostic performance was significantly influenced by case characteristics: match rates were notably higher for specific versus unspecific diagnoses (85.53% vs 31.41%, <i>P</i> < .001) and surgical versus medical cases (79.49% vs 56.25%, <i>P</i> < .001). Open-source models demonstrated markedly superior sourcing capabilities compared to GPT-4-based combinations (<i>P</i> < 1.4e-12), with MBXAI/Qwen2 pipeline achieving perfect citation verification.</p><p><strong>Discussion: </strong>Diagnostic accuracy primarily depended on case characteristics rather than the choice of model pipeline, highlighting fundamental AI alignment challenges in clinical reasoning. Low performance in unspecific diagnoses underscores inherent complexities in clinical definitions rather than technological shortcomings alone.</p><p><strong>Conclusion: </strong>Open-source LLM pipelines provide enhanced sourcing capabilities, crucial for transparent clinical decision-making and interpretability. Further research should expand knowledge bases to include hospital guidelines and regional epidemiology, while exploring on-premises solutions to better align with privacy regulations and clinical integration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf119"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12539180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145348978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf101
Camille Nebeker, Jean Christophe Bélisle-Pipon, Benjamin X Collins, Ashley Cordes, Kadija Ferryman, Brian J McInnis, Shannon K McWeeney, Laurie L Novak, Susannah Rose, Joseph M Yracheta, Ishan C Williams, Xiaoqian Jiang, Ellen W Clayton, Bradley A Malin
Objective: The Bridge2AI program is establishing rules of practice for creating ethically sourced health data repositories to support the effective use of ML/AI in biomedical and behavioral research. Given the initially undefined nature of ethically sourced data, this work concurrently developed definitions and guidelines alongside repository creation, grounded in a practical, operational framework.
Materials and methods: A Value Sensitive Design (VSD) approach was used to explore ethical tensions across stages of health data repository development. The conceptual investigation drew from supply chain management (SCM) processes to (1) identify actors who would interact with or be affected by the data repository use and outcomes; (2) determine what values to consider (ie, traceability accountability, security); and (3) analyze and document value trade-offs (ie, balancing risks of harm to improvements in healthcare). This SCM framework provides operational guidance for managing complex, multi-source data flows with embedded bias mitigation strategies.
Results: This conceptual investigation identified the actors, values, and tensions that influence ethical sourcing when creating a health data repository. The SCM steps provide a scaffolding to support ethical sourcing across the pre-model stages of health data repository development. Ethical sourcing includes documenting data provenance, articulating expectations for experts, and practices for ensuring data privacy, equity, and public benefit. Challenges include risks of ethics washing and highlight the need for transparent, value-driven practices.
Discussion: Integrating VSD with SCM frameworks enables operationalization of ethical values, improving data integrity, mitigating biases, and enhancing trust. This approach highlights how foundational decisions influence repository quality and AI/ML system usability, addressing provenance, traceability, redundancy, and risk management central to ethical data sourcing.
Conclusion: To create authentic, impactful health data repositories that serve public health goals, organizations must prioritize transparency, accountability, and operational frameworks like SCM that comprehensively address the complexities and risks inherent in data stewardship.
{"title":"Ethical sourcing in the context of health data supply chain management: a value sensitive design approach.","authors":"Camille Nebeker, Jean Christophe Bélisle-Pipon, Benjamin X Collins, Ashley Cordes, Kadija Ferryman, Brian J McInnis, Shannon K McWeeney, Laurie L Novak, Susannah Rose, Joseph M Yracheta, Ishan C Williams, Xiaoqian Jiang, Ellen W Clayton, Bradley A Malin","doi":"10.1093/jamiaopen/ooaf101","DOIUrl":"10.1093/jamiaopen/ooaf101","url":null,"abstract":"<p><strong>Objective: </strong>The Bridge2AI program is establishing rules of practice for creating ethically sourced health data repositories to support the effective use of ML/AI in biomedical and behavioral research. Given the initially undefined nature of ethically sourced data, this work concurrently developed definitions and guidelines alongside repository creation, grounded in a practical, operational framework.</p><p><strong>Materials and methods: </strong>A Value Sensitive Design (VSD) approach was used to explore ethical tensions across stages of health data repository development. The conceptual investigation drew from supply chain management (SCM) processes to (1) identify actors who would interact with or be affected by the data repository use and outcomes; (2) determine what values to consider (ie, traceability accountability, security); and (3) analyze and document value trade-offs (ie, balancing risks of harm to improvements in healthcare). This SCM framework provides operational guidance for managing complex, multi-source data flows with embedded bias mitigation strategies.</p><p><strong>Results: </strong>This conceptual investigation identified the actors, values, and tensions that influence ethical sourcing when creating a health data repository. The SCM steps provide a scaffolding to support ethical sourcing across the pre-model stages of health data repository development. Ethical sourcing includes documenting data provenance, articulating expectations for experts, and practices for ensuring data privacy, equity, and public benefit. Challenges include risks of ethics washing and highlight the need for transparent, value-driven practices.</p><p><strong>Discussion: </strong>Integrating VSD with SCM frameworks enables operationalization of ethical values, improving data integrity, mitigating biases, and enhancing trust. This approach highlights how foundational decisions influence repository quality and AI/ML system usability, addressing provenance, traceability, redundancy, and risk management central to ethical data sourcing.</p><p><strong>Conclusion: </strong>To create authentic, impactful health data repositories that serve public health goals, organizations must prioritize transparency, accountability, and operational frameworks like SCM that comprehensively address the complexities and risks inherent in data stewardship.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf101"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12539179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf118
Larry Ma, Alan V Rincon, Joshua Ide, Katrina O'Hara, Rachel Weinstein, Sebastien Hannay, Lucie Keunen, Vincent Keunen, Ivelina Popova, Sherry Yan
Objective: To develop and evaluate a patient-centric medication module within a personal health record (PHR) app for capturing medication use, focusing on accuracy, usability, and concordance.
Materials and methods: The medication module offered 4 entry methods: picklist, National Drug Code (NDC), free-text, and portal import, with the first 2 leveraging RxNorm and openFDA APIs. Patients from an integrated delivery network (IDN) created medication lists and recorded daily use in the app's diary. Pharmacists evaluated medication accuracy by reviewing patient-uploaded medication images. Usability was measured using the System Usability Scale (SUS). Concordance was assessed by comparing Electronic Health Records (EHR) with diary entries.
Results: Over a 14-day period, 85 patients entered 617 medications, with 533 logged in the diary representing current use. Picklist was the most used entry method. Overall medication entry accuracy was 92% (picklist 97%; NDC 87%; free-text 84%; and portal import 100%). The mean system usability score was 56.5 for the study app (patients) and 80.8 for the medication module (pharmacists). EHR concordance with diary entries was low (25% using the 14-day window; 53% using a 1-year window); most unmatched entries were over-the-counter (OTC) medications.
Discussion: Accurate and complete medication records are essential for the safe and effective use of medications. This patient-centric medication module supported accurate capture of prescription and OTC medications. Gaps in EHR data highlight the need to improve medication record accuracy and reconciliation.
Conclusion: Patient-generated health data can have a central role in creating the "Best Possible Medication History" envisioned by the World Health Organization.
{"title":"Development and evaluation of a patient-centric approach for accurate medication capture.","authors":"Larry Ma, Alan V Rincon, Joshua Ide, Katrina O'Hara, Rachel Weinstein, Sebastien Hannay, Lucie Keunen, Vincent Keunen, Ivelina Popova, Sherry Yan","doi":"10.1093/jamiaopen/ooaf118","DOIUrl":"10.1093/jamiaopen/ooaf118","url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate a patient-centric medication module within a personal health record (PHR) app for capturing medication use, focusing on accuracy, usability, and concordance.</p><p><strong>Materials and methods: </strong>The medication module offered 4 entry methods: picklist, National Drug Code (NDC), free-text, and portal import, with the first 2 leveraging RxNorm and openFDA APIs. Patients from an integrated delivery network (IDN) created medication lists and recorded daily use in the app's diary. Pharmacists evaluated medication accuracy by reviewing patient-uploaded medication images. Usability was measured using the System Usability Scale (SUS). Concordance was assessed by comparing Electronic Health Records (EHR) with diary entries.</p><p><strong>Results: </strong>Over a 14-day period, 85 patients entered 617 medications, with 533 logged in the diary representing current use. Picklist was the most used entry method. Overall medication entry accuracy was 92% (picklist 97%; NDC 87%; free-text 84%; and portal import 100%). The mean system usability score was 56.5 for the study app (patients) and 80.8 for the medication module (pharmacists). EHR concordance with diary entries was low (25% using the 14-day window; 53% using a 1-year window); most unmatched entries were over-the-counter (OTC) medications.</p><p><strong>Discussion: </strong>Accurate and complete medication records are essential for the safe and effective use of medications. This patient-centric medication module supported accurate capture of prescription and OTC medications. Gaps in EHR data highlight the need to improve medication record accuracy and reconciliation.</p><p><strong>Conclusion: </strong>Patient-generated health data can have a central role in creating the \"Best Possible Medication History\" envisioned by the World Health Organization.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf118"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf117
Alison M Stuebe, Randall Blanco, Michael Horvath, Mohammad Golam Kibria, Lauren Kucirka, Karl Shieh, David Page, Metin N Gurcan, William Ed Hammond
Objective: Severe maternal morbidity and mortality are higher in the United States than in other high-income countries, and unacceptable disparities persist. To facilitate research on these outcomes, we developed a standardized approach for extracting perinatal data from electronic health records (EHRs).
Materials and methods: To support data model building and validation, we harmonized perinatal EHR data in a common data model, building on lessons learned from multiple prior projects.
Results: We developed an Entity Relationship Diagram (ERD) that aggregates perinatal EHR data at appropriate granularities (ie, mothers, infants, encounters) with indexing of observations to gestational age and time from delivery. We then developed a standard approach to extract, transform, and load pregnancy-related observations from EHRs for inclusion in PCORnet® Common Data Model tables.
Discussion: Our ERD can facilitate cross-institutional research to identify populations at risk and prompt interventions to improve perinatal outcomes.
Conclusion: A structured approach can accelerate the use of EHR data for perinatal research.
{"title":"Development and implementation of an entity relationship diagram for perinatal data.","authors":"Alison M Stuebe, Randall Blanco, Michael Horvath, Mohammad Golam Kibria, Lauren Kucirka, Karl Shieh, David Page, Metin N Gurcan, William Ed Hammond","doi":"10.1093/jamiaopen/ooaf117","DOIUrl":"10.1093/jamiaopen/ooaf117","url":null,"abstract":"<p><strong>Objective: </strong>Severe maternal morbidity and mortality are higher in the United States than in other high-income countries, and unacceptable disparities persist. To facilitate research on these outcomes, we developed a standardized approach for extracting perinatal data from electronic health records (EHRs).</p><p><strong>Materials and methods: </strong>To support data model building and validation, we harmonized perinatal EHR data in a common data model, building on lessons learned from multiple prior projects.</p><p><strong>Results: </strong>We developed an Entity Relationship Diagram (ERD) that aggregates perinatal EHR data at appropriate granularities (ie, mothers, infants, encounters) with indexing of observations to gestational age and time from delivery. We then developed a standard approach to extract, transform, and load pregnancy-related observations from EHRs for inclusion in PCORnet<sup>®</sup> Common Data Model tables.</p><p><strong>Discussion: </strong>Our ERD can facilitate cross-institutional research to identify populations at risk and prompt interventions to improve perinatal outcomes.</p><p><strong>Conclusion: </strong>A structured approach can accelerate the use of EHR data for perinatal research.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf117"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf122
Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang
Objectives: To develop ECG-FM, an open-weight foundation model for electrocardiogram (ECG) analysis, rigorously evaluate its performance on clinically salient tasks, and openly release it alongside a public benchmark.
Materials and methods: In a study using 1.5 million 12-lead ECGs, we present ECG-FM, a transformer-based foundation model pretrained with hybrid self-supervision that combines masked reconstruction and contrastive learning with ECG-specific augmentation. Downstream, we evaluate multi-label ECG interpretation and prediction of reduced left ventricular ejection fraction (LVEF), introducing an openly available benchmark on the MIMIC-IV-ECG dataset. We assess ECG-FM's capabilities through data scaling experiments, latent-space structure analysis, and attention-based saliency.
Results: Finetuned ECG-FM models outperform task-specific baselines in the small-to-medium-scale data regime, exhibit strong label efficiency and cross-dataset generalizability, and achieve high AUROC on salient labels, including atrial fibrillation (0.996) and LVEF (0.929). The pretrained encoder showcases competitive linear probing performance, with functionally discriminative embeddings.
Discussion: Findings indicate that ECG-FM is generalizable, label-efficient, and discriminative for screening, risk stratification, and monitoring. Its representations capture low-level morphology and high-order cardiac semantics, and the pretrained encoder serves as a robust feature-set generator. This work mitigates reliance on large labeled datasets, reduces compute and data requirements, and lowers barriers to reproducibility and cross-study comparison.
Conclusion: ECG-FM is an open, rigorously validated ECG foundation model intended to accelerate transparent, comparable research in the ECG analysis subfield. It is designed for rapid integration and evaluation, especially for delivering practical gains in low-label settings. We release our code, model weights, tutorials, and benchmark at https://github.com/bowang-lab/ECG-FM/.
{"title":"ECG-FM: an open electrocardiogram foundation model.","authors":"Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang","doi":"10.1093/jamiaopen/ooaf122","DOIUrl":"10.1093/jamiaopen/ooaf122","url":null,"abstract":"<p><strong>Objectives: </strong>To develop ECG-FM, an open-weight foundation model for electrocardiogram (ECG) analysis, rigorously evaluate its performance on clinically salient tasks, and openly release it alongside a public benchmark.</p><p><strong>Materials and methods: </strong>In a study using 1.5 million 12-lead ECGs, we present ECG-FM, a transformer-based foundation model pretrained with hybrid self-supervision that combines masked reconstruction and contrastive learning with ECG-specific augmentation. Downstream, we evaluate multi-label ECG interpretation and prediction of reduced left ventricular ejection fraction (LVEF), introducing an openly available benchmark on the MIMIC-IV-ECG dataset. We assess ECG-FM's capabilities through data scaling experiments, latent-space structure analysis, and attention-based saliency.</p><p><strong>Results: </strong>Finetuned ECG-FM models outperform task-specific baselines in the small-to-medium-scale data regime, exhibit strong label efficiency and cross-dataset generalizability, and achieve high AUROC on salient labels, including atrial fibrillation (0.996) and LVEF <math><mrow><mo>≤</mo> <mn>40</mn> <mi>%</mi></mrow> </math> (0.929). The pretrained encoder showcases competitive linear probing performance, with functionally discriminative embeddings.</p><p><strong>Discussion: </strong>Findings indicate that ECG-FM is generalizable, label-efficient, and discriminative for screening, risk stratification, and monitoring. Its representations capture low-level morphology and high-order cardiac semantics, and the pretrained encoder serves as a robust feature-set generator. This work mitigates reliance on large labeled datasets, reduces compute and data requirements, and lowers barriers to reproducibility and cross-study comparison.</p><p><strong>Conclusion: </strong>ECG-FM is an open, rigorously validated ECG foundation model intended to accelerate transparent, comparable research in the ECG analysis subfield. It is designed for rapid integration and evaluation, especially for delivering practical gains in low-label settings. We release our code, model weights, tutorials, and benchmark at https://github.com/bowang-lab/ECG-FM/.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf122"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.
Materials and methods: On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).
Results: FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.
Discussion: The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.
Conclusion: PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.
{"title":"High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes.","authors":"Cristian Estupiñán-Ojeda, Raúl J Sandomingo-Freire, Lluís Padró, Jordi Turmo","doi":"10.1093/jamiaopen/ooaf120","DOIUrl":"10.1093/jamiaopen/ooaf120","url":null,"abstract":"<p><strong>Objectives: </strong>Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.</p><p><strong>Materials and methods: </strong>On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).</p><p><strong>Results: </strong>FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.</p><p><strong>Discussion: </strong>The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.</p><p><strong>Conclusion: </strong>PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf120"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-08eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf104
Asiful Arefeen, Simar Singh, Crystal Razavi, Hassan Ghasemzadeh, Sandesh Dev
Objectives: Despite the rapid development of AI in clinical medicine, reproducibility and methodological limitations hinder its clinical utility. In response, MINimum Information for Medical AI Reporting (MINIMAR) standards were introduced to enhance publication standards and reduce bias, but their application remains unexplored. In this review, we sought to assesses the quality of reporting in AI/ML studies of cardiac amyloidosis (CA) an increasingly important cause of heart failure.
Materials and methods: Using PRISMA-ScR guidelines, we performed a scoping review of English-language articles published through May 2023 which applied AI/ML techniques to diagnose or predict CA. Non-CA studies and those with selective feature sets were excluded. Two researchers independently screened and extracted data. In all, 20 studies met criteria and were assessed for adherence to MINIMAR standards.
Results: The studies showed variable compliance with MINIMAR. Most reported participant age (90%) and gender (85%), but only 25% included ethnic or racial data, and none provided socioeconomic details. The majority (95%) developed diagnostic models, yet only 85% clearly described training features, and 20% addressed missing data. Model evaluation revealed gaps; 80% reported internal validation, but only 20% conducted external validation.
Discussion and conclusion: This study, one of the first to apply MINIMAR criteria to ML research in CA, reveals significant variability and deficiencies in reporting, particularly in patient demographics, model architecture, and evaluation. These findings underscore the need for stricter adherence to standardized reporting guidelines to enhance the reliability, generalizability, and clinical applicability of ML/AI models in CA.
{"title":"Assessing the quality of reporting in artificial intelligence/machine learning research for cardiac amyloidosis.","authors":"Asiful Arefeen, Simar Singh, Crystal Razavi, Hassan Ghasemzadeh, Sandesh Dev","doi":"10.1093/jamiaopen/ooaf104","DOIUrl":"10.1093/jamiaopen/ooaf104","url":null,"abstract":"<p><strong>Objectives: </strong>Despite the rapid development of AI in clinical medicine, reproducibility and methodological limitations hinder its clinical utility. In response, MINimum Information for Medical AI Reporting (MINIMAR) standards were introduced to enhance publication standards and reduce bias, but their application remains unexplored. In this review, we sought to assesses the quality of reporting in AI/ML studies of cardiac amyloidosis (CA) an increasingly important cause of heart failure.</p><p><strong>Materials and methods: </strong>Using PRISMA-ScR guidelines, we performed a scoping review of English-language articles published through May 2023 which applied AI/ML techniques to diagnose or predict CA. Non-CA studies and those with selective feature sets were excluded. Two researchers independently screened and extracted data. In all, 20 studies met criteria and were assessed for adherence to MINIMAR standards.</p><p><strong>Results: </strong>The studies showed variable compliance with MINIMAR. Most reported participant age (90%) and gender (85%), but only 25% included ethnic or racial data, and none provided socioeconomic details. The majority (95%) developed diagnostic models, yet only 85% clearly described training features, and 20% addressed missing data. Model evaluation revealed gaps; 80% reported internal validation, but only 20% conducted external validation.</p><p><strong>Discussion and conclusion: </strong>This study, one of the first to apply MINIMAR criteria to ML research in CA, reveals significant variability and deficiencies in reporting, particularly in patient demographics, model architecture, and evaluation. These findings underscore the need for stricter adherence to standardized reporting guidelines to enhance the reliability, generalizability, and clinical applicability of ML/AI models in CA.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf104"},"PeriodicalIF":3.4,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12507469/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf116
Peter J Hoover, Terri L Blumke, Anna D Ware, Malvika Pillai, Zachary P Veigulis, Catherine M Curtin, Thomas F Osborne
Objective: To develop a more accurate fall prediction model within the Veterans Health Administration.
Materials and methods: The cohort included Veterans admitted to a Veterans Health Administration acute care setting from July 1, 2020, to June 30, 2022, with a length of stay between 1 and 7 days. Demographic and clinical data were obtained through electronic health records. Veterans were identified as having a documented fall through clinical progress notes. A transformer model was used to obtain features of this data, which was then used to train a Light Gradient-Boosting Machine for classification and prediction. Area under the precision-recall curve assisted in model tuning, with geometric mean used to define an optimal classification threshold.
Results: Among 242,844 Veterans assessed, 5965 (2.5%) were documented as having a fall during their clinical stay. Employing a transformer model with a Light Gradient-Boosting Machine resulted in an area under the curve of .851 and an area under the precision-recall curve of .285. With an accuracy of 76.3%, the model resulted in a specificity of 76.2% and a sensitivity of 77.3%.
Discussion: Prior evaluations have highlighted limitations of the Morse Fall Scale (MFS) in accurately assessing fall risk. Developing a time series classification model using existing electronic health record data, our model outperformed traditional MFS-based evaluations and other fall-risk models. Future work is necessary to address limitations, including class imbalance and the need for prospective validation.
Conclusion: An improvement over the MFS, this model, automatically calculated from existing data, can provide a more efficient and accurate means for identifying patients at risk of fall.
{"title":"Predicting falls using electronic health records: a time series approach.","authors":"Peter J Hoover, Terri L Blumke, Anna D Ware, Malvika Pillai, Zachary P Veigulis, Catherine M Curtin, Thomas F Osborne","doi":"10.1093/jamiaopen/ooaf116","DOIUrl":"10.1093/jamiaopen/ooaf116","url":null,"abstract":"<p><strong>Objective: </strong>To develop a more accurate fall prediction model within the Veterans Health Administration.</p><p><strong>Materials and methods: </strong>The cohort included Veterans admitted to a Veterans Health Administration acute care setting from July 1, 2020, to June 30, 2022, with a length of stay between 1 and 7 days. Demographic and clinical data were obtained through electronic health records. Veterans were identified as having a documented fall through clinical progress notes. A transformer model was used to obtain features of this data, which was then used to train a Light Gradient-Boosting Machine for classification and prediction. Area under the precision-recall curve assisted in model tuning, with geometric mean used to define an optimal classification threshold.</p><p><strong>Results: </strong>Among 242,844 Veterans assessed, 5965 (2.5%) were documented as having a fall during their clinical stay. Employing a transformer model with a Light Gradient-Boosting Machine resulted in an area under the curve of .851 and an area under the precision-recall curve of .285. With an accuracy of 76.3%, the model resulted in a specificity of 76.2% and a sensitivity of 77.3%.</p><p><strong>Discussion: </strong>Prior evaluations have highlighted limitations of the Morse Fall Scale (MFS) in accurately assessing fall risk. Developing a time series classification model using existing electronic health record data, our model outperformed traditional MFS-based evaluations and other fall-risk models. Future work is necessary to address limitations, including class imbalance and the need for prospective validation.</p><p><strong>Conclusion: </strong>An improvement over the MFS, this model, automatically calculated from existing data, can provide a more efficient and accurate means for identifying patients at risk of fall.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf116"},"PeriodicalIF":3.4,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492486/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145233301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf112
Amanda Massmann, Natasha J Petry, Max Weaver, Halle Brady, Roxana A Lupu
Objectives: This study evaluates response rates of pharmacogenomics (PGx) nonsteroidal anti-inflammatory drugs (NSAIDs) clinical decision support (CDS) alerts at Sanford Health from May 2020 to December 2024.
Materials and methods: A retrospective analysis was conducted on PGx NSAIDs interruptive alerts. Response options were classified into five categories (1) continuation of triggering NSAID order, (2) dose modification, (3) alternative NSAID ordered without PGx implications, (4) alternative analgesic (ie, opioid) ordered, and (5) discontinuation of NSAID without alternative therapy.
Results: The study analyzed 2361 alert instances from 978 patients. The most common response was discontinuing NSAID without alternative therapy (43%). Dose modifications and orders for alternative analgesics comprised 2.57% and 14.67% of responses, respectively. The initial acceptance rate was 62.6%. Prior NSAID use significantly impacted override rates (60% vs 40%, P < .001). A 409-day breaking point was observed to affect alert acceptance rates, with the highest acceptance in NSAID naïve patients (96.1%).
Discussion: PGx NSAIDs CDS alert acceptance rates were higher compared to general CDS acceptance rates. This study highlights opportunities for continuous improvement including optimizing alert modality, modifying alert criteria to include look-back periods, and implementing genetically adapted ordersets.
Conclusion: The initial acceptance rate of PGx NSAIDs CDS alerts was 62.6%, however, with significantly higher acceptance rates in NSAID naïve patients (62.6% vs 96.1%, P < .001). Integration of CDS is vital to the successful implementation of PGx in clinical practice.
目的:本研究评估2020年5月至2024年12月Sanford Health的药物基因组学(PGx)非甾体抗炎药(NSAIDs)临床决策支持(CDS)警报的响应率。材料和方法:回顾性分析PGx非甾体抗炎药中断警报。反应选择分为五类(1)继续触发NSAID订单,(2)剂量调整,(3)无PGx影响的替代NSAID订单,(4)替代镇痛药(如阿片类药物)订单,(5)无替代治疗的NSAID停药。结果:本研究分析了978例患者的2361例报警病例。最常见的反应是停止非甾体抗炎药而不进行替代治疗(43%)。剂量调整和替代镇痛药的顺序分别占应答的2.57%和14.67%。初始录取率为62.6%。先前使用非甾体抗炎药显著影响覆盖率(60% vs 40%, P)讨论:PGx非甾体抗炎药CDS警报接受率高于一般CDS接受率。本研究强调了持续改进的机会,包括优化警报模式,修改警报标准以包括回顾期,以及实现基因适应的订单集。结论:PGx非甾体抗炎药CDS预警的初始接受率为62.6%,而NSAID naïve患者的接受率明显更高(62.6% vs 96.1%, P
{"title":"Analyzing the impact of pharmacogenomics-guided nonsteroidal anti-inflammatory drug alerts in clinical practice.","authors":"Amanda Massmann, Natasha J Petry, Max Weaver, Halle Brady, Roxana A Lupu","doi":"10.1093/jamiaopen/ooaf112","DOIUrl":"10.1093/jamiaopen/ooaf112","url":null,"abstract":"<p><strong>Objectives: </strong>This study evaluates response rates of pharmacogenomics (PGx) nonsteroidal anti-inflammatory drugs (NSAIDs) clinical decision support (CDS) alerts at Sanford Health from May 2020 to December 2024.</p><p><strong>Materials and methods: </strong>A retrospective analysis was conducted on PGx NSAIDs interruptive alerts. Response options were classified into five categories (1) continuation of triggering NSAID order, (2) dose modification, (3) alternative NSAID ordered without PGx implications, (4) alternative analgesic (ie, opioid) ordered, and (5) discontinuation of NSAID without alternative therapy.</p><p><strong>Results: </strong>The study analyzed 2361 alert instances from 978 patients. The most common response was discontinuing NSAID without alternative therapy (43%). Dose modifications and orders for alternative analgesics comprised 2.57% and 14.67% of responses, respectively. The initial acceptance rate was 62.6%. Prior NSAID use significantly impacted override rates (60% vs 40%, <i>P</i> < .001). A 409-day breaking point was observed to affect alert acceptance rates, with the highest acceptance in NSAID naïve patients (96.1%).</p><p><strong>Discussion: </strong>PGx NSAIDs CDS alert acceptance rates were higher compared to general CDS acceptance rates. This study highlights opportunities for continuous improvement including optimizing alert modality, modifying alert criteria to include look-back periods, and implementing genetically adapted ordersets.</p><p><strong>Conclusion: </strong>The initial acceptance rate of PGx NSAIDs CDS alerts was 62.6%, however, with significantly higher acceptance rates in NSAID naïve patients (62.6% vs 96.1%, <i>P</i> < .001). Integration of CDS is vital to the successful implementation of PGx in clinical practice.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf112"},"PeriodicalIF":3.4,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492481/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145233287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}