Pub Date : 2025-10-27eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf126
Yejin Jeong, Margaret Smith, Robert J Gallo, Lisa Marie Knowlton, Steven Lin, Lisa Shieh
Objectives: To evaluate ChatGPT's ability to perform thematic analysis of medical Best Practice Advisory (BPA) free-text comments and identify prompt engineering strategies that optimize performance.
Materials and methods: We analyzed 778 BPA comments from a pilot AI-enabled clinical deterioration intervention at Stanford Hospital, categorized as reasons for deterioration (Category 1) and care team actions (Category 2). Prompt engineering strategies (role, context specification, stepwise instructions, few-shot prompting, and dialogue-based calibration) were tested on a 20% random subsample to determine the best-performing prompt. Using that prompt, ChatGPT conducted deductive coding on the full dataset followed by inductive analysis. Agreement with human coding was assessed as inter-rater reliability (IRR) using Cohen's Kappa (κ).
Results: With structured prompts and calibration, ChatGPT achieved substantial agreement with human coding (κ = 0.76 for Category 1; κ = 0.78 for Category 2). Baseline agreement was higher for Category 1 than Category 2, reflecting differences in comment type and complexity, but calibration improved both. Inductive analysis yielded 9 themes, with ChatGPT-generated themes closely aligning with human coding.
Discussion: ChatGPT can accelerate qualitative analysis, but its rigor depends heavily on prompt engineering. Key strategies included role and context specification, pulse-check calibration, and safeguard techniques, which enhanced reliability and reproducibility.
Conclusion: This study demonstrates the feasibility of ChatGPT-assisted thematic analysis and introduces a structured approach for applying LLMs to qualitative analysis of clinical free-text data, underscoring prompt engineering as a methodological lever.
{"title":"Leveraging ChatGPT for thematic analysis of medical best practice advisory data.","authors":"Yejin Jeong, Margaret Smith, Robert J Gallo, Lisa Marie Knowlton, Steven Lin, Lisa Shieh","doi":"10.1093/jamiaopen/ooaf126","DOIUrl":"10.1093/jamiaopen/ooaf126","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate ChatGPT's ability to perform thematic analysis of medical Best Practice Advisory (BPA) free-text comments and identify prompt engineering strategies that optimize performance.</p><p><strong>Materials and methods: </strong>We analyzed 778 BPA comments from a pilot AI-enabled clinical deterioration intervention at Stanford Hospital, categorized as reasons for deterioration (Category 1) and care team actions (Category 2). Prompt engineering strategies (role, context specification, stepwise instructions, few-shot prompting, and dialogue-based calibration) were tested on a 20% random subsample to determine the best-performing prompt. Using that prompt, ChatGPT conducted deductive coding on the full dataset followed by inductive analysis. Agreement with human coding was assessed as inter-rater reliability (IRR) using Cohen's Kappa (κ).</p><p><strong>Results: </strong>With structured prompts and calibration, ChatGPT achieved substantial agreement with human coding (κ = 0.76 for Category 1; κ = 0.78 for Category 2). Baseline agreement was higher for Category 1 than Category 2, reflecting differences in comment type and complexity, but calibration improved both. Inductive analysis yielded 9 themes, with ChatGPT-generated themes closely aligning with human coding.</p><p><strong>Discussion: </strong>ChatGPT can accelerate qualitative analysis, but its rigor depends heavily on prompt engineering. Key strategies included role and context specification, pulse-check calibration, and safeguard techniques, which enhanced reliability and reproducibility.</p><p><strong>Conclusion: </strong>This study demonstrates the feasibility of ChatGPT-assisted thematic analysis and introduces a structured approach for applying LLMs to qualitative analysis of clinical free-text data, underscoring prompt engineering as a methodological lever.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf126"},"PeriodicalIF":3.4,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12757007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145900965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-26eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf131
Sara D Turbow, Priscilla H Kim, Camille P Vaughan, Mohammed K Ali, Carolyn K Clevenger, Molly M Perkins
Background: Health information exchanges (HIE), tools that electronically share clinical data across healthcare organizations, provide the opportunity to improve patient care. While widely available, HIE is utilized in only 2%-10% of patient encounters. Few studies have explored current barriers to use. The goal of this study was to evaluate current clinician perspectives on HIE and barriers to use at the point of care.
Methods: We conducted a population-based survey of internal medicine (IM) and emergency medicine (EM) physicians, physician assistants, and nurse practitioners at 8 health systems in the Atlanta area. Survey responses were analyzed overall and by specialty.
Results: Of 1239 clinicians who were invited to participate, 276 (22.3%) responded, with 65.6% of respondents working in inpatient IM and 32.6% in EM. 80.4% of respondents reported using HIE at least once a day, while 4.8% reported never using HIE. Most clinicians used HIE at least daily to access lab results (80.2%), clinical notes (81.9%), imaging reports (74.0%), and medication lists (71.2%). The most reported barriers to HIE utilization included unavailability of needed information (66.4%), adding time to patient care (45.5%), and ease of simply reordering tests (31.6%). HIE use and reported barriers to use were similar across IM and EM providers.
Conclusions: Of those responding to the survey, daily access of HIE was common. We identified several barriers to HIE use, which can be used to develop targeted interventions to improve utilization and patient care. Approaches to reach survey non-responders are also needed.
{"title":"Perceptions of and barriers to health information exchange use among emergency medicine and inpatient internal medicine clinicians in the Atlanta, Georgia metropolitan region.","authors":"Sara D Turbow, Priscilla H Kim, Camille P Vaughan, Mohammed K Ali, Carolyn K Clevenger, Molly M Perkins","doi":"10.1093/jamiaopen/ooaf131","DOIUrl":"10.1093/jamiaopen/ooaf131","url":null,"abstract":"<p><strong>Background: </strong>Health information exchanges (HIE), tools that electronically share clinical data across healthcare organizations, provide the opportunity to improve patient care. While widely available, HIE is utilized in only 2%-10% of patient encounters. Few studies have explored current barriers to use. The goal of this study was to evaluate current clinician perspectives on HIE and barriers to use at the point of care.</p><p><strong>Methods: </strong>We conducted a population-based survey of internal medicine (IM) and emergency medicine (EM) physicians, physician assistants, and nurse practitioners at 8 health systems in the Atlanta area. Survey responses were analyzed overall and by specialty.</p><p><strong>Results: </strong>Of 1239 clinicians who were invited to participate, 276 (22.3%) responded, with 65.6% of respondents working in inpatient IM and 32.6% in EM. 80.4% of respondents reported using HIE at least once a day, while 4.8% reported never using HIE. Most clinicians used HIE at least daily to access lab results (80.2%), clinical notes (81.9%), imaging reports (74.0%), and medication lists (71.2%). The most reported barriers to HIE utilization included unavailability of needed information (66.4%), adding time to patient care (45.5%), and ease of simply reordering tests (31.6%). HIE use and reported barriers to use were similar across IM and EM providers.</p><p><strong>Conclusions: </strong>Of those responding to the survey, daily access of HIE was common. We identified several barriers to HIE use, which can be used to develop targeted interventions to improve utilization and patient care. Approaches to reach survey non-responders are also needed.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf131"},"PeriodicalIF":3.4,"publicationDate":"2025-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12557315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145393773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-26eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf133
Irene Brusini, Suyin Lee, Jacob Hollingsworth, Amanda Sees, Matthew Hackenberg, Harm Scherpbier, Raquel López-Díez, Nadejda Leavitt
Objective: This study evaluates the performance and deployment feasibility of a machine learning (ML) model to identify adult-onset type 1 diabetes (T1D) initially coded as type 2 on electronic medical records (EMRs) from a health information exchange (HIE). To our knowledge, this is the first evaluation of such a model on real-world HIE data.
Materials and methods: An existing ML model, trained on national US EMR data, was tested on a regional HIE dataset, after several adjustments for compatibility. A localized model retrained on the regional dataset was compared to the national model. Discrepancies between the 2 datasets' features and cohorts were also investigated.
Results: The national model performed well on HIE data (AUROC = 0.751; precision at 5% recall [PR5] = 25.5%), and localization further improved performance (AUROC = 0.774; PR5 = 35.4%). Differences in the 2 models' top predictors reflected the discrepancies between the datasets and gaps in HIE data capture.
Discussion: The adjustments needed for testing on HIE data highlight the importance of aligning algorithm design with deployment needs. Moreover, localization increased precision, making it more appealing for patient screening, but added complexity and may impact scalability. Additionally, while HIEs offer opportunities for large-scale deployment, data inconsistencies across member organizations could undermine accuracy and providers' trust in ML-based tools.
Conclusion: Our findings offer valuable insights into the feasibility of at-scale deployment of ML models for high-risk patient identification. Although this work focuses on detecting potentially misclassified T1D, our learnings can also inform other applications.
{"title":"Deploying machine learning models in clinical settings: a real-world feasibility analysis for a model identifying adult-onset type 1 diabetes initially classified as type 2.","authors":"Irene Brusini, Suyin Lee, Jacob Hollingsworth, Amanda Sees, Matthew Hackenberg, Harm Scherpbier, Raquel López-Díez, Nadejda Leavitt","doi":"10.1093/jamiaopen/ooaf133","DOIUrl":"10.1093/jamiaopen/ooaf133","url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the performance and deployment feasibility of a machine learning (ML) model to identify adult-onset type 1 diabetes (T1D) initially coded as type 2 on electronic medical records (EMRs) from a health information exchange (HIE). To our knowledge, this is the first evaluation of such a model on real-world HIE data.</p><p><strong>Materials and methods: </strong>An existing ML model, trained on national US EMR data, was tested on a regional HIE dataset, after several adjustments for compatibility. A localized model retrained on the regional dataset was compared to the national model. Discrepancies between the 2 datasets' features and cohorts were also investigated.</p><p><strong>Results: </strong>The national model performed well on HIE data (AUROC = 0.751; precision at 5% recall [PR5] = 25.5%), and localization further improved performance (AUROC = 0.774; PR5 = 35.4%). Differences in the 2 models' top predictors reflected the discrepancies between the datasets and gaps in HIE data capture.</p><p><strong>Discussion: </strong>The adjustments needed for testing on HIE data highlight the importance of aligning algorithm design with deployment needs. Moreover, localization increased precision, making it more appealing for patient screening, but added complexity and may impact scalability. Additionally, while HIEs offer opportunities for large-scale deployment, data inconsistencies across member organizations could undermine accuracy and providers' trust in ML-based tools.</p><p><strong>Conclusion: </strong>Our findings offer valuable insights into the feasibility of at-scale deployment of ML models for high-risk patient identification. Although this work focuses on detecting potentially misclassified T1D, our learnings can also inform other applications.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf133"},"PeriodicalIF":3.4,"publicationDate":"2025-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12557313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145393808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-23eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf113
Cong Liu, Katherine D Crew, Jennifer Morse, Jodell E Linder, Antonis C Antoniou, Tim Carver, Josh Cortopassi, Josh F Peterson, Casey N Ta, Christin Hoell, Cynthia Prows, Eimear E Kenny, Emily Miller, Emma Perez, Gail P Jarvik, Harris T Bland, Jacqueline A Odgis, Kathleen F Mittendorf, Katherine E Bonini, Kyle McGuffin, Leah C Kottyan, Mary Maradik, Nita Limdi, Noura S Abul-Husn, Priya N Marathe, Sabrina A Suckiel, Sienna Aguilar, Toni J Lewis, Wei-Qi Wei, Yuan Luo, Robert R Freimuth, Hakon Hakonarson, Chunhua Weng, Wendy K Chung, Georgia L Wiesner
Objectives: To implementation an automated multi-institutional pipeline that delivers breast-cancer risk integrated with polygenic risk scores, monogenic variants, family history, and clinical factors, emphasizing operational challenges and their solutions.
Materials and methods: A five-stage process was executed at ten sites. Data streams from REDCap surveys, PRS and monogenic reports, and MeTree pedigrees were normalized and forwarded through a REDCap plug-in to the CanRisk API.
Results: Integrated risk was returned to >10 000 women; 3.6% were ≥25 % lifetime risk and 0.9% carried pathogenic variants. Pipeline generated score aligns well with manual generated ones. Major barriers such as heterogeneous pedigree formats, missing data, edge-case handling, and evolving model versions were identified and resolved through mapping rules, imputations, and iterative testing.
Discussion: Cross-platform data harmonization and stakeholder alignment were decisive for success. Borderline-risk communication and model-version drift remain open issues.
Conclusion: Large-scale PRS-integrated breast-cancer risk reporting is feasible but requires robust interoperability standards and iterative governance.
{"title":"Implementing integrated genomic risk assessments for breast cancer: lessons learned from the Electronic Medical Records and Genomics study.","authors":"Cong Liu, Katherine D Crew, Jennifer Morse, Jodell E Linder, Antonis C Antoniou, Tim Carver, Josh Cortopassi, Josh F Peterson, Casey N Ta, Christin Hoell, Cynthia Prows, Eimear E Kenny, Emily Miller, Emma Perez, Gail P Jarvik, Harris T Bland, Jacqueline A Odgis, Kathleen F Mittendorf, Katherine E Bonini, Kyle McGuffin, Leah C Kottyan, Mary Maradik, Nita Limdi, Noura S Abul-Husn, Priya N Marathe, Sabrina A Suckiel, Sienna Aguilar, Toni J Lewis, Wei-Qi Wei, Yuan Luo, Robert R Freimuth, Hakon Hakonarson, Chunhua Weng, Wendy K Chung, Georgia L Wiesner","doi":"10.1093/jamiaopen/ooaf113","DOIUrl":"10.1093/jamiaopen/ooaf113","url":null,"abstract":"<p><strong>Objectives: </strong>To implementation an automated multi-institutional pipeline that delivers breast-cancer risk integrated with polygenic risk scores, monogenic variants, family history, and clinical factors, emphasizing operational challenges and their solutions.</p><p><strong>Materials and methods: </strong>A five-stage process was executed at ten sites. Data streams from REDCap surveys, PRS and monogenic reports, and MeTree pedigrees were normalized and forwarded through a REDCap plug-in to the CanRisk API.</p><p><strong>Results: </strong>Integrated risk was returned to >10 000 women; 3.6% were ≥25 % lifetime risk and 0.9% carried pathogenic variants. Pipeline generated score aligns well with manual generated ones. Major barriers such as heterogeneous pedigree formats, missing data, edge-case handling, and evolving model versions were identified and resolved through mapping rules, imputations, and iterative testing.</p><p><strong>Discussion: </strong>Cross-platform data harmonization and stakeholder alignment were decisive for success. Borderline-risk communication and model-version drift remain open issues.</p><p><strong>Conclusion: </strong>Large-scale PRS-integrated breast-cancer risk reporting is feasible but requires robust interoperability standards and iterative governance.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf113"},"PeriodicalIF":3.4,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12552095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf119
Alexandre Niset, Ines Melot, Margaux Pireau, Alexandre Englebert, Nathan Scius, Julien Flament, Salim El Hadwe, Mejdeddine Al Barajraji, Henri Thonon, Sami Barrit
Objective: To evaluate predictive diagnostic performance of open- and closed-source large language models (LLMs) in emergency medicine, addressing the urgent need for innovative clinical decision support tools amid rising patient volumes and staffing shortages.
Materials and methods: We generated 2370 AI-driven diagnostic predictions (Top-5 diagnoses from each of 6 model pipelines per patient), using data from 79 real-world emergency department cases collected consecutively during a 24-hour peak influx period at a tertiary care center. Pipelines combined open- and closed-source embedding models (text-embedding-ada-002, MXBAI) with foundational models (GPT-4, Llama3, and Qwen2) grounded via retrieval-augmented generation using emergency medicine textbooks. Models' predictions were assessed against reference diagnoses established by expert consensus.
Results: All pipelines achieved comparable diagnostic match rates (62.03%-72.15%). Diagnostic performance was significantly influenced by case characteristics: match rates were notably higher for specific versus unspecific diagnoses (85.53% vs 31.41%, P < .001) and surgical versus medical cases (79.49% vs 56.25%, P < .001). Open-source models demonstrated markedly superior sourcing capabilities compared to GPT-4-based combinations (P < 1.4e-12), with MBXAI/Qwen2 pipeline achieving perfect citation verification.
Discussion: Diagnostic accuracy primarily depended on case characteristics rather than the choice of model pipeline, highlighting fundamental AI alignment challenges in clinical reasoning. Low performance in unspecific diagnoses underscores inherent complexities in clinical definitions rather than technological shortcomings alone.
Conclusion: Open-source LLM pipelines provide enhanced sourcing capabilities, crucial for transparent clinical decision-making and interpretability. Further research should expand knowledge bases to include hospital guidelines and regional epidemiology, while exploring on-premises solutions to better align with privacy regulations and clinical integration.
目的:评估开源和闭源大型语言模型(LLMs)在急诊医学中的预测诊断性能,以解决在患者数量增加和人员短缺的情况下对创新临床决策支持工具的迫切需求。材料和方法:我们生成了2370个人工智能驱动的诊断预测(每个患者6个模型管道中的前5个诊断),使用了在三级医疗中心24小时高峰涌入期间连续收集的79个真实急诊科病例的数据。管道结合了开源和闭源嵌入模型(文本嵌入-ada-002, MXBAI)和基础模型(GPT-4, Llama3和Qwen2),通过检索增强生成基于急诊医学教科书。根据专家共识建立的参考诊断评估模型的预测。结果:各管道诊断符合率均达到相当水平(62.03% ~ 72.15%)。诊断表现受病例特征的显著影响:特异性诊断的匹配率明显高于非特异性诊断(85.53% vs 31.41%)。讨论:诊断准确性主要取决于病例特征,而不是模型管道的选择,这凸显了临床推理中基本的人工智能对齐挑战。在非特异性诊断中的低表现强调了临床定义的内在复杂性,而不仅仅是技术缺陷。结论:开源LLM管道提供了增强的采购能力,对透明的临床决策和可解释性至关重要。进一步的研究应扩大知识库,包括医院指南和区域流行病学,同时探索本地解决方案,以更好地配合隐私法规和临床整合。
{"title":"Grounded large language models for diagnostic prediction in real-world emergency department settings.","authors":"Alexandre Niset, Ines Melot, Margaux Pireau, Alexandre Englebert, Nathan Scius, Julien Flament, Salim El Hadwe, Mejdeddine Al Barajraji, Henri Thonon, Sami Barrit","doi":"10.1093/jamiaopen/ooaf119","DOIUrl":"10.1093/jamiaopen/ooaf119","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate predictive diagnostic performance of open- and closed-source large language models (LLMs) in emergency medicine, addressing the urgent need for innovative clinical decision support tools amid rising patient volumes and staffing shortages.</p><p><strong>Materials and methods: </strong>We generated 2370 AI-driven diagnostic predictions (Top-5 diagnoses from each of 6 model pipelines per patient), using data from 79 real-world emergency department cases collected consecutively during a 24-hour peak influx period at a tertiary care center. Pipelines combined open- and closed-source embedding models (text-embedding-ada-002, MXBAI) with foundational models (GPT-4, Llama3, and Qwen2) grounded via retrieval-augmented generation using emergency medicine textbooks. Models' predictions were assessed against reference diagnoses established by expert consensus.</p><p><strong>Results: </strong>All pipelines achieved comparable diagnostic match rates (62.03%-72.15%). Diagnostic performance was significantly influenced by case characteristics: match rates were notably higher for specific versus unspecific diagnoses (85.53% vs 31.41%, <i>P</i> < .001) and surgical versus medical cases (79.49% vs 56.25%, <i>P</i> < .001). Open-source models demonstrated markedly superior sourcing capabilities compared to GPT-4-based combinations (<i>P</i> < 1.4e-12), with MBXAI/Qwen2 pipeline achieving perfect citation verification.</p><p><strong>Discussion: </strong>Diagnostic accuracy primarily depended on case characteristics rather than the choice of model pipeline, highlighting fundamental AI alignment challenges in clinical reasoning. Low performance in unspecific diagnoses underscores inherent complexities in clinical definitions rather than technological shortcomings alone.</p><p><strong>Conclusion: </strong>Open-source LLM pipelines provide enhanced sourcing capabilities, crucial for transparent clinical decision-making and interpretability. Further research should expand knowledge bases to include hospital guidelines and regional epidemiology, while exploring on-premises solutions to better align with privacy regulations and clinical integration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf119"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12539180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145348978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf101
Camille Nebeker, Jean Christophe Bélisle-Pipon, Benjamin X Collins, Ashley Cordes, Kadija Ferryman, Brian J McInnis, Shannon K McWeeney, Laurie L Novak, Susannah Rose, Joseph M Yracheta, Ishan C Williams, Xiaoqian Jiang, Ellen W Clayton, Bradley A Malin
Objective: The Bridge2AI program is establishing rules of practice for creating ethically sourced health data repositories to support the effective use of ML/AI in biomedical and behavioral research. Given the initially undefined nature of ethically sourced data, this work concurrently developed definitions and guidelines alongside repository creation, grounded in a practical, operational framework.
Materials and methods: A Value Sensitive Design (VSD) approach was used to explore ethical tensions across stages of health data repository development. The conceptual investigation drew from supply chain management (SCM) processes to (1) identify actors who would interact with or be affected by the data repository use and outcomes; (2) determine what values to consider (ie, traceability accountability, security); and (3) analyze and document value trade-offs (ie, balancing risks of harm to improvements in healthcare). This SCM framework provides operational guidance for managing complex, multi-source data flows with embedded bias mitigation strategies.
Results: This conceptual investigation identified the actors, values, and tensions that influence ethical sourcing when creating a health data repository. The SCM steps provide a scaffolding to support ethical sourcing across the pre-model stages of health data repository development. Ethical sourcing includes documenting data provenance, articulating expectations for experts, and practices for ensuring data privacy, equity, and public benefit. Challenges include risks of ethics washing and highlight the need for transparent, value-driven practices.
Discussion: Integrating VSD with SCM frameworks enables operationalization of ethical values, improving data integrity, mitigating biases, and enhancing trust. This approach highlights how foundational decisions influence repository quality and AI/ML system usability, addressing provenance, traceability, redundancy, and risk management central to ethical data sourcing.
Conclusion: To create authentic, impactful health data repositories that serve public health goals, organizations must prioritize transparency, accountability, and operational frameworks like SCM that comprehensively address the complexities and risks inherent in data stewardship.
{"title":"Ethical sourcing in the context of health data supply chain management: a value sensitive design approach.","authors":"Camille Nebeker, Jean Christophe Bélisle-Pipon, Benjamin X Collins, Ashley Cordes, Kadija Ferryman, Brian J McInnis, Shannon K McWeeney, Laurie L Novak, Susannah Rose, Joseph M Yracheta, Ishan C Williams, Xiaoqian Jiang, Ellen W Clayton, Bradley A Malin","doi":"10.1093/jamiaopen/ooaf101","DOIUrl":"10.1093/jamiaopen/ooaf101","url":null,"abstract":"<p><strong>Objective: </strong>The Bridge2AI program is establishing rules of practice for creating ethically sourced health data repositories to support the effective use of ML/AI in biomedical and behavioral research. Given the initially undefined nature of ethically sourced data, this work concurrently developed definitions and guidelines alongside repository creation, grounded in a practical, operational framework.</p><p><strong>Materials and methods: </strong>A Value Sensitive Design (VSD) approach was used to explore ethical tensions across stages of health data repository development. The conceptual investigation drew from supply chain management (SCM) processes to (1) identify actors who would interact with or be affected by the data repository use and outcomes; (2) determine what values to consider (ie, traceability accountability, security); and (3) analyze and document value trade-offs (ie, balancing risks of harm to improvements in healthcare). This SCM framework provides operational guidance for managing complex, multi-source data flows with embedded bias mitigation strategies.</p><p><strong>Results: </strong>This conceptual investigation identified the actors, values, and tensions that influence ethical sourcing when creating a health data repository. The SCM steps provide a scaffolding to support ethical sourcing across the pre-model stages of health data repository development. Ethical sourcing includes documenting data provenance, articulating expectations for experts, and practices for ensuring data privacy, equity, and public benefit. Challenges include risks of ethics washing and highlight the need for transparent, value-driven practices.</p><p><strong>Discussion: </strong>Integrating VSD with SCM frameworks enables operationalization of ethical values, improving data integrity, mitigating biases, and enhancing trust. This approach highlights how foundational decisions influence repository quality and AI/ML system usability, addressing provenance, traceability, redundancy, and risk management central to ethical data sourcing.</p><p><strong>Conclusion: </strong>To create authentic, impactful health data repositories that serve public health goals, organizations must prioritize transparency, accountability, and operational frameworks like SCM that comprehensively address the complexities and risks inherent in data stewardship.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf101"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12539179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf118
Larry Ma, Alan V Rincon, Joshua Ide, Katrina O'Hara, Rachel Weinstein, Sebastien Hannay, Lucie Keunen, Vincent Keunen, Ivelina Popova, Sherry Yan
Objective: To develop and evaluate a patient-centric medication module within a personal health record (PHR) app for capturing medication use, focusing on accuracy, usability, and concordance.
Materials and methods: The medication module offered 4 entry methods: picklist, National Drug Code (NDC), free-text, and portal import, with the first 2 leveraging RxNorm and openFDA APIs. Patients from an integrated delivery network (IDN) created medication lists and recorded daily use in the app's diary. Pharmacists evaluated medication accuracy by reviewing patient-uploaded medication images. Usability was measured using the System Usability Scale (SUS). Concordance was assessed by comparing Electronic Health Records (EHR) with diary entries.
Results: Over a 14-day period, 85 patients entered 617 medications, with 533 logged in the diary representing current use. Picklist was the most used entry method. Overall medication entry accuracy was 92% (picklist 97%; NDC 87%; free-text 84%; and portal import 100%). The mean system usability score was 56.5 for the study app (patients) and 80.8 for the medication module (pharmacists). EHR concordance with diary entries was low (25% using the 14-day window; 53% using a 1-year window); most unmatched entries were over-the-counter (OTC) medications.
Discussion: Accurate and complete medication records are essential for the safe and effective use of medications. This patient-centric medication module supported accurate capture of prescription and OTC medications. Gaps in EHR data highlight the need to improve medication record accuracy and reconciliation.
Conclusion: Patient-generated health data can have a central role in creating the "Best Possible Medication History" envisioned by the World Health Organization.
{"title":"Development and evaluation of a patient-centric approach for accurate medication capture.","authors":"Larry Ma, Alan V Rincon, Joshua Ide, Katrina O'Hara, Rachel Weinstein, Sebastien Hannay, Lucie Keunen, Vincent Keunen, Ivelina Popova, Sherry Yan","doi":"10.1093/jamiaopen/ooaf118","DOIUrl":"10.1093/jamiaopen/ooaf118","url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate a patient-centric medication module within a personal health record (PHR) app for capturing medication use, focusing on accuracy, usability, and concordance.</p><p><strong>Materials and methods: </strong>The medication module offered 4 entry methods: picklist, National Drug Code (NDC), free-text, and portal import, with the first 2 leveraging RxNorm and openFDA APIs. Patients from an integrated delivery network (IDN) created medication lists and recorded daily use in the app's diary. Pharmacists evaluated medication accuracy by reviewing patient-uploaded medication images. Usability was measured using the System Usability Scale (SUS). Concordance was assessed by comparing Electronic Health Records (EHR) with diary entries.</p><p><strong>Results: </strong>Over a 14-day period, 85 patients entered 617 medications, with 533 logged in the diary representing current use. Picklist was the most used entry method. Overall medication entry accuracy was 92% (picklist 97%; NDC 87%; free-text 84%; and portal import 100%). The mean system usability score was 56.5 for the study app (patients) and 80.8 for the medication module (pharmacists). EHR concordance with diary entries was low (25% using the 14-day window; 53% using a 1-year window); most unmatched entries were over-the-counter (OTC) medications.</p><p><strong>Discussion: </strong>Accurate and complete medication records are essential for the safe and effective use of medications. This patient-centric medication module supported accurate capture of prescription and OTC medications. Gaps in EHR data highlight the need to improve medication record accuracy and reconciliation.</p><p><strong>Conclusion: </strong>Patient-generated health data can have a central role in creating the \"Best Possible Medication History\" envisioned by the World Health Organization.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf118"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf117
Alison M Stuebe, Randall Blanco, Michael Horvath, Mohammad Golam Kibria, Lauren Kucirka, Karl Shieh, David Page, Metin N Gurcan, William Ed Hammond
Objective: Severe maternal morbidity and mortality are higher in the United States than in other high-income countries, and unacceptable disparities persist. To facilitate research on these outcomes, we developed a standardized approach for extracting perinatal data from electronic health records (EHRs).
Materials and methods: To support data model building and validation, we harmonized perinatal EHR data in a common data model, building on lessons learned from multiple prior projects.
Results: We developed an Entity Relationship Diagram (ERD) that aggregates perinatal EHR data at appropriate granularities (ie, mothers, infants, encounters) with indexing of observations to gestational age and time from delivery. We then developed a standard approach to extract, transform, and load pregnancy-related observations from EHRs for inclusion in PCORnet® Common Data Model tables.
Discussion: Our ERD can facilitate cross-institutional research to identify populations at risk and prompt interventions to improve perinatal outcomes.
Conclusion: A structured approach can accelerate the use of EHR data for perinatal research.
{"title":"Development and implementation of an entity relationship diagram for perinatal data.","authors":"Alison M Stuebe, Randall Blanco, Michael Horvath, Mohammad Golam Kibria, Lauren Kucirka, Karl Shieh, David Page, Metin N Gurcan, William Ed Hammond","doi":"10.1093/jamiaopen/ooaf117","DOIUrl":"10.1093/jamiaopen/ooaf117","url":null,"abstract":"<p><strong>Objective: </strong>Severe maternal morbidity and mortality are higher in the United States than in other high-income countries, and unacceptable disparities persist. To facilitate research on these outcomes, we developed a standardized approach for extracting perinatal data from electronic health records (EHRs).</p><p><strong>Materials and methods: </strong>To support data model building and validation, we harmonized perinatal EHR data in a common data model, building on lessons learned from multiple prior projects.</p><p><strong>Results: </strong>We developed an Entity Relationship Diagram (ERD) that aggregates perinatal EHR data at appropriate granularities (ie, mothers, infants, encounters) with indexing of observations to gestational age and time from delivery. We then developed a standard approach to extract, transform, and load pregnancy-related observations from EHRs for inclusion in PCORnet<sup>®</sup> Common Data Model tables.</p><p><strong>Discussion: </strong>Our ERD can facilitate cross-institutional research to identify populations at risk and prompt interventions to improve perinatal outcomes.</p><p><strong>Conclusion: </strong>A structured approach can accelerate the use of EHR data for perinatal research.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf117"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16eCollection Date: 2025-10-01DOI: 10.1093/jamiaopen/ooaf122
Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang
Objectives: To develop ECG-FM, an open-weight foundation model for electrocardiogram (ECG) analysis, rigorously evaluate its performance on clinically salient tasks, and openly release it alongside a public benchmark.
Materials and methods: In a study using 1.5 million 12-lead ECGs, we present ECG-FM, a transformer-based foundation model pretrained with hybrid self-supervision that combines masked reconstruction and contrastive learning with ECG-specific augmentation. Downstream, we evaluate multi-label ECG interpretation and prediction of reduced left ventricular ejection fraction (LVEF), introducing an openly available benchmark on the MIMIC-IV-ECG dataset. We assess ECG-FM's capabilities through data scaling experiments, latent-space structure analysis, and attention-based saliency.
Results: Finetuned ECG-FM models outperform task-specific baselines in the small-to-medium-scale data regime, exhibit strong label efficiency and cross-dataset generalizability, and achieve high AUROC on salient labels, including atrial fibrillation (0.996) and LVEF (0.929). The pretrained encoder showcases competitive linear probing performance, with functionally discriminative embeddings.
Discussion: Findings indicate that ECG-FM is generalizable, label-efficient, and discriminative for screening, risk stratification, and monitoring. Its representations capture low-level morphology and high-order cardiac semantics, and the pretrained encoder serves as a robust feature-set generator. This work mitigates reliance on large labeled datasets, reduces compute and data requirements, and lowers barriers to reproducibility and cross-study comparison.
Conclusion: ECG-FM is an open, rigorously validated ECG foundation model intended to accelerate transparent, comparable research in the ECG analysis subfield. It is designed for rapid integration and evaluation, especially for delivering practical gains in low-label settings. We release our code, model weights, tutorials, and benchmark at https://github.com/bowang-lab/ECG-FM/.
{"title":"ECG-FM: an open electrocardiogram foundation model.","authors":"Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang","doi":"10.1093/jamiaopen/ooaf122","DOIUrl":"10.1093/jamiaopen/ooaf122","url":null,"abstract":"<p><strong>Objectives: </strong>To develop ECG-FM, an open-weight foundation model for electrocardiogram (ECG) analysis, rigorously evaluate its performance on clinically salient tasks, and openly release it alongside a public benchmark.</p><p><strong>Materials and methods: </strong>In a study using 1.5 million 12-lead ECGs, we present ECG-FM, a transformer-based foundation model pretrained with hybrid self-supervision that combines masked reconstruction and contrastive learning with ECG-specific augmentation. Downstream, we evaluate multi-label ECG interpretation and prediction of reduced left ventricular ejection fraction (LVEF), introducing an openly available benchmark on the MIMIC-IV-ECG dataset. We assess ECG-FM's capabilities through data scaling experiments, latent-space structure analysis, and attention-based saliency.</p><p><strong>Results: </strong>Finetuned ECG-FM models outperform task-specific baselines in the small-to-medium-scale data regime, exhibit strong label efficiency and cross-dataset generalizability, and achieve high AUROC on salient labels, including atrial fibrillation (0.996) and LVEF <math><mrow><mo>≤</mo> <mn>40</mn> <mi>%</mi></mrow> </math> (0.929). The pretrained encoder showcases competitive linear probing performance, with functionally discriminative embeddings.</p><p><strong>Discussion: </strong>Findings indicate that ECG-FM is generalizable, label-efficient, and discriminative for screening, risk stratification, and monitoring. Its representations capture low-level morphology and high-order cardiac semantics, and the pretrained encoder serves as a robust feature-set generator. This work mitigates reliance on large labeled datasets, reduces compute and data requirements, and lowers barriers to reproducibility and cross-study comparison.</p><p><strong>Conclusion: </strong>ECG-FM is an open, rigorously validated ECG foundation model intended to accelerate transparent, comparable research in the ECG analysis subfield. It is designed for rapid integration and evaluation, especially for delivering practical gains in low-label settings. We release our code, model weights, tutorials, and benchmark at https://github.com/bowang-lab/ECG-FM/.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf122"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.
Materials and methods: On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).
Results: FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.
Discussion: The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.
Conclusion: PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.
{"title":"High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes.","authors":"Cristian Estupiñán-Ojeda, Raúl J Sandomingo-Freire, Lluís Padró, Jordi Turmo","doi":"10.1093/jamiaopen/ooaf120","DOIUrl":"10.1093/jamiaopen/ooaf120","url":null,"abstract":"<p><strong>Objectives: </strong>Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.</p><p><strong>Materials and methods: </strong>On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).</p><p><strong>Results: </strong>FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.</p><p><strong>Discussion: </strong>The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.</p><p><strong>Conclusion: </strong>PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf120"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}