首页 > 最新文献

JAMIA Open最新文献

英文 中文
Leveraging ChatGPT for thematic analysis of medical best practice advisory data. 利用ChatGPT对医疗最佳实践咨询数据进行专题分析。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-27 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf126
Yejin Jeong, Margaret Smith, Robert J Gallo, Lisa Marie Knowlton, Steven Lin, Lisa Shieh

Objectives: To evaluate ChatGPT's ability to perform thematic analysis of medical Best Practice Advisory (BPA) free-text comments and identify prompt engineering strategies that optimize performance.

Materials and methods: We analyzed 778 BPA comments from a pilot AI-enabled clinical deterioration intervention at Stanford Hospital, categorized as reasons for deterioration (Category 1) and care team actions (Category 2). Prompt engineering strategies (role, context specification, stepwise instructions, few-shot prompting, and dialogue-based calibration) were tested on a 20% random subsample to determine the best-performing prompt. Using that prompt, ChatGPT conducted deductive coding on the full dataset followed by inductive analysis. Agreement with human coding was assessed as inter-rater reliability (IRR) using Cohen's Kappa (κ).

Results: With structured prompts and calibration, ChatGPT achieved substantial agreement with human coding (κ = 0.76 for Category 1; κ = 0.78 for Category 2). Baseline agreement was higher for Category 1 than Category 2, reflecting differences in comment type and complexity, but calibration improved both. Inductive analysis yielded 9 themes, with ChatGPT-generated themes closely aligning with human coding.

Discussion: ChatGPT can accelerate qualitative analysis, but its rigor depends heavily on prompt engineering. Key strategies included role and context specification, pulse-check calibration, and safeguard techniques, which enhanced reliability and reproducibility.

Conclusion: This study demonstrates the feasibility of ChatGPT-assisted thematic analysis and introduces a structured approach for applying LLMs to qualitative analysis of clinical free-text data, underscoring prompt engineering as a methodological lever.

目的:评估ChatGPT对医疗最佳实践咨询(BPA)自由文本评论进行专题分析的能力,并及时确定优化性能的工程策略。材料和方法:我们分析了斯坦福医院人工智能临床恶化干预试点的778条BPA评论,将其分类为恶化原因(第一类)和护理团队行动(第二类)。提示工程策略(角色、上下文规范、逐步指令、少量提示和基于对话的校准)在20%的随机子样本上进行测试,以确定最佳执行提示。使用该提示,ChatGPT对整个数据集进行演绎编码,然后进行归纳分析。采用Cohen’s Kappa (κ)评价与人类编码的一致性。结果:通过结构化提示和校准,ChatGPT与人类编码基本一致(第1类κ = 0.76;第2类κ = 0.78)。类别1的基线一致性高于类别2,反映了评论类型和复杂性的差异,但校准改善了两者。归纳分析产生了9个主题,chatgpt生成的主题与人类编码密切相关。讨论:ChatGPT可以加速定性分析,但其严密性在很大程度上依赖于即时工程。关键策略包括角色和上下文规范、脉冲检查校准和保障技术,这些策略增强了可靠性和可重复性。结论:本研究证明了chatgpt辅助主题分析的可行性,并引入了一种结构化方法,将法学硕士应用于临床自由文本数据的定性分析,强调了快速工程作为方法论杠杆的作用。
{"title":"Leveraging ChatGPT for thematic analysis of medical best practice advisory data.","authors":"Yejin Jeong, Margaret Smith, Robert J Gallo, Lisa Marie Knowlton, Steven Lin, Lisa Shieh","doi":"10.1093/jamiaopen/ooaf126","DOIUrl":"10.1093/jamiaopen/ooaf126","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate ChatGPT's ability to perform thematic analysis of medical Best Practice Advisory (BPA) free-text comments and identify prompt engineering strategies that optimize performance.</p><p><strong>Materials and methods: </strong>We analyzed 778 BPA comments from a pilot AI-enabled clinical deterioration intervention at Stanford Hospital, categorized as reasons for deterioration (Category 1) and care team actions (Category 2). Prompt engineering strategies (role, context specification, stepwise instructions, few-shot prompting, and dialogue-based calibration) were tested on a 20% random subsample to determine the best-performing prompt. Using that prompt, ChatGPT conducted deductive coding on the full dataset followed by inductive analysis. Agreement with human coding was assessed as inter-rater reliability (IRR) using Cohen's Kappa (κ).</p><p><strong>Results: </strong>With structured prompts and calibration, ChatGPT achieved substantial agreement with human coding (κ = 0.76 for Category 1; κ = 0.78 for Category 2). Baseline agreement was higher for Category 1 than Category 2, reflecting differences in comment type and complexity, but calibration improved both. Inductive analysis yielded 9 themes, with ChatGPT-generated themes closely aligning with human coding.</p><p><strong>Discussion: </strong>ChatGPT can accelerate qualitative analysis, but its rigor depends heavily on prompt engineering. Key strategies included role and context specification, pulse-check calibration, and safeguard techniques, which enhanced reliability and reproducibility.</p><p><strong>Conclusion: </strong>This study demonstrates the feasibility of ChatGPT-assisted thematic analysis and introduces a structured approach for applying LLMs to qualitative analysis of clinical free-text data, underscoring prompt engineering as a methodological lever.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf126"},"PeriodicalIF":3.4,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12757007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145900965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceptions of and barriers to health information exchange use among emergency medicine and inpatient internal medicine clinicians in the Atlanta, Georgia metropolitan region. 佐治亚州亚特兰大市市区急诊医学和住院内科临床医生对健康信息交流使用的认知和障碍
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-26 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf131
Sara D Turbow, Priscilla H Kim, Camille P Vaughan, Mohammed K Ali, Carolyn K Clevenger, Molly M Perkins

Background: Health information exchanges (HIE), tools that electronically share clinical data across healthcare organizations, provide the opportunity to improve patient care. While widely available, HIE is utilized in only 2%-10% of patient encounters. Few studies have explored current barriers to use. The goal of this study was to evaluate current clinician perspectives on HIE and barriers to use at the point of care.

Methods: We conducted a population-based survey of internal medicine (IM) and emergency medicine (EM) physicians, physician assistants, and nurse practitioners at 8 health systems in the Atlanta area. Survey responses were analyzed overall and by specialty.

Results: Of 1239 clinicians who were invited to participate, 276 (22.3%) responded, with 65.6% of respondents working in inpatient IM and 32.6% in EM. 80.4% of respondents reported using HIE at least once a day, while 4.8% reported never using HIE. Most clinicians used HIE at least daily to access lab results (80.2%), clinical notes (81.9%), imaging reports (74.0%), and medication lists (71.2%). The most reported barriers to HIE utilization included unavailability of needed information (66.4%), adding time to patient care (45.5%), and ease of simply reordering tests (31.6%). HIE use and reported barriers to use were similar across IM and EM providers.

Conclusions: Of those responding to the survey, daily access of HIE was common. We identified several barriers to HIE use, which can be used to develop targeted interventions to improve utilization and patient care. Approaches to reach survey non-responders are also needed.

背景:健康信息交换(HIE)是跨医疗保健组织以电子方式共享临床数据的工具,为改善患者护理提供了机会。虽然HIE可以广泛使用,但只有2%-10%的患者使用。很少有研究探索目前使用的障碍。本研究的目的是评估当前临床医生对HIE的看法和在护理点使用的障碍。方法:我们对亚特兰大地区8个卫生系统的内科(IM)和急诊医学(EM)医师、医师助理和执业护士进行了一项基于人群的调查。调查结果进行了整体分析和专业分析。结果:在1239名受邀参与的临床医生中,276名(22.3%)回应,其中65.6%的受访者在住院IM工作,32.6%在EM工作。80.4%的受访者报告每天至少使用一次HIE, 4.8%的受访者报告从未使用过HIE。大多数临床医生至少每天使用HIE来获取实验室结果(80.2%)、临床记录(81.9%)、影像报告(74.0%)和药物清单(71.2%)。据报道,使用HIE的最大障碍包括无法获得所需信息(66.4%),增加患者护理时间(45.5%),以及简单地重新安排检查(31.6%)。HIE的使用和报告的使用障碍在IM和EM提供商之间相似。结论:在回应调查的患者中,每日获得HIE的情况很常见。我们确定了HIE使用的几个障碍,可用于开发有针对性的干预措施,以提高利用率和患者护理。还需要接触调查无应答者的方法。
{"title":"Perceptions of and barriers to health information exchange use among emergency medicine and inpatient internal medicine clinicians in the Atlanta, Georgia metropolitan region.","authors":"Sara D Turbow, Priscilla H Kim, Camille P Vaughan, Mohammed K Ali, Carolyn K Clevenger, Molly M Perkins","doi":"10.1093/jamiaopen/ooaf131","DOIUrl":"10.1093/jamiaopen/ooaf131","url":null,"abstract":"<p><strong>Background: </strong>Health information exchanges (HIE), tools that electronically share clinical data across healthcare organizations, provide the opportunity to improve patient care. While widely available, HIE is utilized in only 2%-10% of patient encounters. Few studies have explored current barriers to use. The goal of this study was to evaluate current clinician perspectives on HIE and barriers to use at the point of care.</p><p><strong>Methods: </strong>We conducted a population-based survey of internal medicine (IM) and emergency medicine (EM) physicians, physician assistants, and nurse practitioners at 8 health systems in the Atlanta area. Survey responses were analyzed overall and by specialty.</p><p><strong>Results: </strong>Of 1239 clinicians who were invited to participate, 276 (22.3%) responded, with 65.6% of respondents working in inpatient IM and 32.6% in EM. 80.4% of respondents reported using HIE at least once a day, while 4.8% reported never using HIE. Most clinicians used HIE at least daily to access lab results (80.2%), clinical notes (81.9%), imaging reports (74.0%), and medication lists (71.2%). The most reported barriers to HIE utilization included unavailability of needed information (66.4%), adding time to patient care (45.5%), and ease of simply reordering tests (31.6%). HIE use and reported barriers to use were similar across IM and EM providers.</p><p><strong>Conclusions: </strong>Of those responding to the survey, daily access of HIE was common. We identified several barriers to HIE use, which can be used to develop targeted interventions to improve utilization and patient care. Approaches to reach survey non-responders are also needed.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf131"},"PeriodicalIF":3.4,"publicationDate":"2025-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12557315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145393773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deploying machine learning models in clinical settings: a real-world feasibility analysis for a model identifying adult-onset type 1 diabetes initially classified as type 2. 在临床环境中部署机器学习模型:对一种识别成人发病1型糖尿病的模型进行现实世界的可行性分析,该模型最初被分类为2型。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-26 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf133
Irene Brusini, Suyin Lee, Jacob Hollingsworth, Amanda Sees, Matthew Hackenberg, Harm Scherpbier, Raquel López-Díez, Nadejda Leavitt

Objective: This study evaluates the performance and deployment feasibility of a machine learning (ML) model to identify adult-onset type 1 diabetes (T1D) initially coded as type 2 on electronic medical records (EMRs) from a health information exchange (HIE). To our knowledge, this is the first evaluation of such a model on real-world HIE data.

Materials and methods: An existing ML model, trained on national US EMR data, was tested on a regional HIE dataset, after several adjustments for compatibility. A localized model retrained on the regional dataset was compared to the national model. Discrepancies between the 2 datasets' features and cohorts were also investigated.

Results: The national model performed well on HIE data (AUROC = 0.751; precision at 5% recall [PR5] = 25.5%), and localization further improved performance (AUROC = 0.774; PR5 = 35.4%). Differences in the 2 models' top predictors reflected the discrepancies between the datasets and gaps in HIE data capture.

Discussion: The adjustments needed for testing on HIE data highlight the importance of aligning algorithm design with deployment needs. Moreover, localization increased precision, making it more appealing for patient screening, but added complexity and may impact scalability. Additionally, while HIEs offer opportunities for large-scale deployment, data inconsistencies across member organizations could undermine accuracy and providers' trust in ML-based tools.

Conclusion: Our findings offer valuable insights into the feasibility of at-scale deployment of ML models for high-risk patient identification. Although this work focuses on detecting potentially misclassified T1D, our learnings can also inform other applications.

目的:本研究评估了一种机器学习(ML)模型的性能和部署可行性,该模型用于识别来自健康信息交换(HIE)的电子病历(emr)上最初编码为2型的成人发病1型糖尿病(T1D)。据我们所知,这是第一次在真实的HIE数据上对这种模型进行评估。材料和方法:在美国国家EMR数据上训练的现有ML模型,经过多次兼容性调整后,在区域HIE数据集上进行了测试。在区域数据集上重新训练的局部模型与国家模型进行了比较。我们还调查了两个数据集的特征和队列之间的差异。结果:国家模型在HIE数据上表现良好(AUROC = 0.751, 5%查全率下的准确率[PR5] = 25.5%),本地化模型进一步提高了性能(AUROC = 0.774, PR5 = 35.4%)。两种模型最高预测因子的差异反映了数据集之间的差异和HIE数据捕获的差距。讨论:测试HIE数据所需的调整突出了将算法设计与部署需求保持一致的重要性。此外,本地化提高了精确度,使其对患者筛查更有吸引力,但增加了复杂性,并可能影响可扩展性。此外,虽然HIEs为大规模部署提供了机会,但成员组织之间的数据不一致可能会破坏基于ml的工具的准确性和提供商的信任。结论:我们的研究结果为大规模部署ML模型用于高风险患者识别的可行性提供了有价值的见解。虽然这项工作的重点是检测潜在的错误分类T1D,但我们的学习也可以为其他应用提供信息。
{"title":"Deploying machine learning models in clinical settings: a real-world feasibility analysis for a model identifying adult-onset type 1 diabetes initially classified as type 2.","authors":"Irene Brusini, Suyin Lee, Jacob Hollingsworth, Amanda Sees, Matthew Hackenberg, Harm Scherpbier, Raquel López-Díez, Nadejda Leavitt","doi":"10.1093/jamiaopen/ooaf133","DOIUrl":"10.1093/jamiaopen/ooaf133","url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the performance and deployment feasibility of a machine learning (ML) model to identify adult-onset type 1 diabetes (T1D) initially coded as type 2 on electronic medical records (EMRs) from a health information exchange (HIE). To our knowledge, this is the first evaluation of such a model on real-world HIE data.</p><p><strong>Materials and methods: </strong>An existing ML model, trained on national US EMR data, was tested on a regional HIE dataset, after several adjustments for compatibility. A localized model retrained on the regional dataset was compared to the national model. Discrepancies between the 2 datasets' features and cohorts were also investigated.</p><p><strong>Results: </strong>The national model performed well on HIE data (AUROC = 0.751; precision at 5% recall [PR5] = 25.5%), and localization further improved performance (AUROC = 0.774; PR5 = 35.4%). Differences in the 2 models' top predictors reflected the discrepancies between the datasets and gaps in HIE data capture.</p><p><strong>Discussion: </strong>The adjustments needed for testing on HIE data highlight the importance of aligning algorithm design with deployment needs. Moreover, localization increased precision, making it more appealing for patient screening, but added complexity and may impact scalability. Additionally, while HIEs offer opportunities for large-scale deployment, data inconsistencies across member organizations could undermine accuracy and providers' trust in ML-based tools.</p><p><strong>Conclusion: </strong>Our findings offer valuable insights into the feasibility of at-scale deployment of ML models for high-risk patient identification. Although this work focuses on detecting potentially misclassified T1D, our learnings can also inform other applications.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf133"},"PeriodicalIF":3.4,"publicationDate":"2025-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12557313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145393808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementing integrated genomic risk assessments for breast cancer: lessons learned from the Electronic Medical Records and Genomics study. 实施乳腺癌综合基因组风险评估:从电子医疗记录和基因组学研究中吸取的经验教训。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-23 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf113
Cong Liu, Katherine D Crew, Jennifer Morse, Jodell E Linder, Antonis C Antoniou, Tim Carver, Josh Cortopassi, Josh F Peterson, Casey N Ta, Christin Hoell, Cynthia Prows, Eimear E Kenny, Emily Miller, Emma Perez, Gail P Jarvik, Harris T Bland, Jacqueline A Odgis, Kathleen F Mittendorf, Katherine E Bonini, Kyle McGuffin, Leah C Kottyan, Mary Maradik, Nita Limdi, Noura S Abul-Husn, Priya N Marathe, Sabrina A Suckiel, Sienna Aguilar, Toni J Lewis, Wei-Qi Wei, Yuan Luo, Robert R Freimuth, Hakon Hakonarson, Chunhua Weng, Wendy K Chung, Georgia L Wiesner

Objectives: To implementation an automated multi-institutional pipeline that delivers breast-cancer risk integrated with polygenic risk scores, monogenic variants, family history, and clinical factors, emphasizing operational challenges and their solutions.

Materials and methods: A five-stage process was executed at ten sites. Data streams from REDCap surveys, PRS and monogenic reports, and MeTree pedigrees were normalized and forwarded through a REDCap plug-in to the CanRisk API.

Results: Integrated risk was returned to >10 000 women; 3.6% were ≥25 % lifetime risk and 0.9% carried pathogenic variants. Pipeline generated score aligns well with manual generated ones. Major barriers such as heterogeneous pedigree formats, missing data, edge-case handling, and evolving model versions were identified and resolved through mapping rules, imputations, and iterative testing.

Discussion: Cross-platform data harmonization and stakeholder alignment were decisive for success. Borderline-risk communication and model-version drift remain open issues.

Conclusion: Large-scale PRS-integrated breast-cancer risk reporting is feasible but requires robust interoperability standards and iterative governance.

目的:通过多基因风险评分、单基因变异、家族史和临床因素,实现一个自动化的多机构乳腺癌风险传递管道,强调操作上的挑战及其解决方案。材料和方法:在10个地点进行了五阶段的过程。来自REDCap调查、PRS和单基因报告以及MeTree谱系的数据流被规范化,并通过REDCap插件转发到CanRisk API。结果:综合风险恢复到10万名妇女;3.6%终生风险≥25 %,0.9%携带致病变异。管道生成的分数与手动生成的分数是一致的。主要的障碍,如异构谱系格式、缺失数据、边缘情况处理和不断发展的模型版本,通过映射规则、估算和迭代测试来识别和解决。讨论:跨平台数据协调和利益相关者一致是成功的决定性因素。边缘风险沟通和模型版本漂移仍然是悬而未决的问题。结论:大规模整合prs的乳腺癌风险报告是可行的,但需要强大的互操作性标准和迭代治理。
{"title":"Implementing integrated genomic risk assessments for breast cancer: lessons learned from the Electronic Medical Records and Genomics study.","authors":"Cong Liu, Katherine D Crew, Jennifer Morse, Jodell E Linder, Antonis C Antoniou, Tim Carver, Josh Cortopassi, Josh F Peterson, Casey N Ta, Christin Hoell, Cynthia Prows, Eimear E Kenny, Emily Miller, Emma Perez, Gail P Jarvik, Harris T Bland, Jacqueline A Odgis, Kathleen F Mittendorf, Katherine E Bonini, Kyle McGuffin, Leah C Kottyan, Mary Maradik, Nita Limdi, Noura S Abul-Husn, Priya N Marathe, Sabrina A Suckiel, Sienna Aguilar, Toni J Lewis, Wei-Qi Wei, Yuan Luo, Robert R Freimuth, Hakon Hakonarson, Chunhua Weng, Wendy K Chung, Georgia L Wiesner","doi":"10.1093/jamiaopen/ooaf113","DOIUrl":"10.1093/jamiaopen/ooaf113","url":null,"abstract":"<p><strong>Objectives: </strong>To implementation an automated multi-institutional pipeline that delivers breast-cancer risk integrated with polygenic risk scores, monogenic variants, family history, and clinical factors, emphasizing operational challenges and their solutions.</p><p><strong>Materials and methods: </strong>A five-stage process was executed at ten sites. Data streams from REDCap surveys, PRS and monogenic reports, and MeTree pedigrees were normalized and forwarded through a REDCap plug-in to the CanRisk API.</p><p><strong>Results: </strong>Integrated risk was returned to >10 000 women; 3.6% were ≥25 % lifetime risk and 0.9% carried pathogenic variants. Pipeline generated score aligns well with manual generated ones. Major barriers such as heterogeneous pedigree formats, missing data, edge-case handling, and evolving model versions were identified and resolved through mapping rules, imputations, and iterative testing.</p><p><strong>Discussion: </strong>Cross-platform data harmonization and stakeholder alignment were decisive for success. Borderline-risk communication and model-version drift remain open issues.</p><p><strong>Conclusion: </strong>Large-scale PRS-integrated breast-cancer risk reporting is feasible but requires robust interoperability standards and iterative governance.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf113"},"PeriodicalIF":3.4,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12552095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grounded large language models for diagnostic prediction in real-world emergency department settings. 基于大型语言模型的诊断预测在现实世界的急诊科设置。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-21 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf119
Alexandre Niset, Ines Melot, Margaux Pireau, Alexandre Englebert, Nathan Scius, Julien Flament, Salim El Hadwe, Mejdeddine Al Barajraji, Henri Thonon, Sami Barrit

Objective: To evaluate predictive diagnostic performance of open- and closed-source large language models (LLMs) in emergency medicine, addressing the urgent need for innovative clinical decision support tools amid rising patient volumes and staffing shortages.

Materials and methods: We generated 2370 AI-driven diagnostic predictions (Top-5 diagnoses from each of 6 model pipelines per patient), using data from 79 real-world emergency department cases collected consecutively during a 24-hour peak influx period at a tertiary care center. Pipelines combined open- and closed-source embedding models (text-embedding-ada-002, MXBAI) with foundational models (GPT-4, Llama3, and Qwen2) grounded via retrieval-augmented generation using emergency medicine textbooks. Models' predictions were assessed against reference diagnoses established by expert consensus.

Results: All pipelines achieved comparable diagnostic match rates (62.03%-72.15%). Diagnostic performance was significantly influenced by case characteristics: match rates were notably higher for specific versus unspecific diagnoses (85.53% vs 31.41%, P < .001) and surgical versus medical cases (79.49% vs 56.25%, P < .001). Open-source models demonstrated markedly superior sourcing capabilities compared to GPT-4-based combinations (P < 1.4e-12), with MBXAI/Qwen2 pipeline achieving perfect citation verification.

Discussion: Diagnostic accuracy primarily depended on case characteristics rather than the choice of model pipeline, highlighting fundamental AI alignment challenges in clinical reasoning. Low performance in unspecific diagnoses underscores inherent complexities in clinical definitions rather than technological shortcomings alone.

Conclusion: Open-source LLM pipelines provide enhanced sourcing capabilities, crucial for transparent clinical decision-making and interpretability. Further research should expand knowledge bases to include hospital guidelines and regional epidemiology, while exploring on-premises solutions to better align with privacy regulations and clinical integration.

目的:评估开源和闭源大型语言模型(LLMs)在急诊医学中的预测诊断性能,以解决在患者数量增加和人员短缺的情况下对创新临床决策支持工具的迫切需求。材料和方法:我们生成了2370个人工智能驱动的诊断预测(每个患者6个模型管道中的前5个诊断),使用了在三级医疗中心24小时高峰涌入期间连续收集的79个真实急诊科病例的数据。管道结合了开源和闭源嵌入模型(文本嵌入-ada-002, MXBAI)和基础模型(GPT-4, Llama3和Qwen2),通过检索增强生成基于急诊医学教科书。根据专家共识建立的参考诊断评估模型的预测。结果:各管道诊断符合率均达到相当水平(62.03% ~ 72.15%)。诊断表现受病例特征的显著影响:特异性诊断的匹配率明显高于非特异性诊断(85.53% vs 31.41%)。讨论:诊断准确性主要取决于病例特征,而不是模型管道的选择,这凸显了临床推理中基本的人工智能对齐挑战。在非特异性诊断中的低表现强调了临床定义的内在复杂性,而不仅仅是技术缺陷。结论:开源LLM管道提供了增强的采购能力,对透明的临床决策和可解释性至关重要。进一步的研究应扩大知识库,包括医院指南和区域流行病学,同时探索本地解决方案,以更好地配合隐私法规和临床整合。
{"title":"Grounded large language models for diagnostic prediction in real-world emergency department settings.","authors":"Alexandre Niset, Ines Melot, Margaux Pireau, Alexandre Englebert, Nathan Scius, Julien Flament, Salim El Hadwe, Mejdeddine Al Barajraji, Henri Thonon, Sami Barrit","doi":"10.1093/jamiaopen/ooaf119","DOIUrl":"10.1093/jamiaopen/ooaf119","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate predictive diagnostic performance of open- and closed-source large language models (LLMs) in emergency medicine, addressing the urgent need for innovative clinical decision support tools amid rising patient volumes and staffing shortages.</p><p><strong>Materials and methods: </strong>We generated 2370 AI-driven diagnostic predictions (Top-5 diagnoses from each of 6 model pipelines per patient), using data from 79 real-world emergency department cases collected consecutively during a 24-hour peak influx period at a tertiary care center. Pipelines combined open- and closed-source embedding models (text-embedding-ada-002, MXBAI) with foundational models (GPT-4, Llama3, and Qwen2) grounded via retrieval-augmented generation using emergency medicine textbooks. Models' predictions were assessed against reference diagnoses established by expert consensus.</p><p><strong>Results: </strong>All pipelines achieved comparable diagnostic match rates (62.03%-72.15%). Diagnostic performance was significantly influenced by case characteristics: match rates were notably higher for specific versus unspecific diagnoses (85.53% vs 31.41%, <i>P</i> < .001) and surgical versus medical cases (79.49% vs 56.25%, <i>P</i> < .001). Open-source models demonstrated markedly superior sourcing capabilities compared to GPT-4-based combinations (<i>P</i> < 1.4e-12), with MBXAI/Qwen2 pipeline achieving perfect citation verification.</p><p><strong>Discussion: </strong>Diagnostic accuracy primarily depended on case characteristics rather than the choice of model pipeline, highlighting fundamental AI alignment challenges in clinical reasoning. Low performance in unspecific diagnoses underscores inherent complexities in clinical definitions rather than technological shortcomings alone.</p><p><strong>Conclusion: </strong>Open-source LLM pipelines provide enhanced sourcing capabilities, crucial for transparent clinical decision-making and interpretability. Further research should expand knowledge bases to include hospital guidelines and regional epidemiology, while exploring on-premises solutions to better align with privacy regulations and clinical integration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf119"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12539180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145348978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ethical sourcing in the context of health data supply chain management: a value sensitive design approach. 健康数据供应链管理中的道德采购:一种价值敏感的设计方法。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-21 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf101
Camille Nebeker, Jean Christophe Bélisle-Pipon, Benjamin X Collins, Ashley Cordes, Kadija Ferryman, Brian J McInnis, Shannon K McWeeney, Laurie L Novak, Susannah Rose, Joseph M Yracheta, Ishan C Williams, Xiaoqian Jiang, Ellen W Clayton, Bradley A Malin

Objective: The Bridge2AI program is establishing rules of practice for creating ethically sourced health data repositories to support the effective use of ML/AI in biomedical and behavioral research. Given the initially undefined nature of ethically sourced data, this work concurrently developed definitions and guidelines alongside repository creation, grounded in a practical, operational framework.

Materials and methods: A Value Sensitive Design (VSD) approach was used to explore ethical tensions across stages of health data repository development. The conceptual investigation drew from supply chain management (SCM) processes to (1) identify actors who would interact with or be affected by the data repository use and outcomes; (2) determine what values to consider (ie, traceability accountability, security); and (3) analyze and document value trade-offs (ie, balancing risks of harm to improvements in healthcare). This SCM framework provides operational guidance for managing complex, multi-source data flows with embedded bias mitigation strategies.

Results: This conceptual investigation identified the actors, values, and tensions that influence ethical sourcing when creating a health data repository. The SCM steps provide a scaffolding to support ethical sourcing across the pre-model stages of health data repository development. Ethical sourcing includes documenting data provenance, articulating expectations for experts, and practices for ensuring data privacy, equity, and public benefit. Challenges include risks of ethics washing and highlight the need for transparent, value-driven practices.

Discussion: Integrating VSD with SCM frameworks enables operationalization of ethical values, improving data integrity, mitigating biases, and enhancing trust. This approach highlights how foundational decisions influence repository quality and AI/ML system usability, addressing provenance, traceability, redundancy, and risk management central to ethical data sourcing.

Conclusion: To create authentic, impactful health data repositories that serve public health goals, organizations must prioritize transparency, accountability, and operational frameworks like SCM that comprehensively address the complexities and risks inherent in data stewardship.

目的:Bridge2AI计划正在建立实践规则,以创建合乎道德来源的健康数据存储库,以支持ML/AI在生物医学和行为研究中的有效使用。考虑到道德来源数据最初未定义的性质,这项工作与存储库创建同时开发了定义和指导方针,以实际的、可操作的框架为基础。材料和方法:使用价值敏感设计(VSD)方法来探索健康数据存储库开发各个阶段的伦理紧张关系。概念性调查是从供应链管理(SCM)过程中得出的:(1)确定将与数据存储库的使用和结果交互或受其影响的参与者;(2)确定要考虑哪些价值(即可追溯性、问责性、安全性);(3)分析和记录价值权衡(即平衡医疗保健改善的危害风险)。该SCM框架为管理复杂的多源数据流提供了操作指导,并具有嵌入式偏差缓解策略。结果:这一概念性调查确定了在创建健康数据存储库时影响道德采购的参与者、价值观和紧张关系。SCM步骤提供了一个框架,以支持健康数据存储库开发的建模前阶段的道德采购。道德采购包括记录数据来源,阐明对专家的期望,以及确保数据隐私、公平和公共利益的实践。挑战包括道德清洗的风险,并强调需要透明、价值驱动的实践。讨论:将VSD与SCM框架集成,可以实现道德价值的操作化,提高数据完整性,减轻偏见,并增强信任。这种方法强调了基础决策如何影响存储库质量和AI/ML系统可用性,解决来源、可追溯性、冗余和道德数据源中心的风险管理。结论:为了创建真实的、有影响力的健康数据存储库,以服务于公共卫生目标,组织必须优先考虑透明度、问责制和SCM等操作框架,以全面解决数据管理中固有的复杂性和风险。
{"title":"Ethical sourcing in the context of health data supply chain management: a value sensitive design approach.","authors":"Camille Nebeker, Jean Christophe Bélisle-Pipon, Benjamin X Collins, Ashley Cordes, Kadija Ferryman, Brian J McInnis, Shannon K McWeeney, Laurie L Novak, Susannah Rose, Joseph M Yracheta, Ishan C Williams, Xiaoqian Jiang, Ellen W Clayton, Bradley A Malin","doi":"10.1093/jamiaopen/ooaf101","DOIUrl":"10.1093/jamiaopen/ooaf101","url":null,"abstract":"<p><strong>Objective: </strong>The Bridge2AI program is establishing rules of practice for creating ethically sourced health data repositories to support the effective use of ML/AI in biomedical and behavioral research. Given the initially undefined nature of ethically sourced data, this work concurrently developed definitions and guidelines alongside repository creation, grounded in a practical, operational framework.</p><p><strong>Materials and methods: </strong>A Value Sensitive Design (VSD) approach was used to explore ethical tensions across stages of health data repository development. The conceptual investigation drew from supply chain management (SCM) processes to (1) identify actors who would interact with or be affected by the data repository use and outcomes; (2) determine what values to consider (ie, traceability accountability, security); and (3) analyze and document value trade-offs (ie, balancing risks of harm to improvements in healthcare). This SCM framework provides operational guidance for managing complex, multi-source data flows with embedded bias mitigation strategies.</p><p><strong>Results: </strong>This conceptual investigation identified the actors, values, and tensions that influence ethical sourcing when creating a health data repository. The SCM steps provide a scaffolding to support ethical sourcing across the pre-model stages of health data repository development. Ethical sourcing includes documenting data provenance, articulating expectations for experts, and practices for ensuring data privacy, equity, and public benefit. Challenges include risks of ethics washing and highlight the need for transparent, value-driven practices.</p><p><strong>Discussion: </strong>Integrating VSD with SCM frameworks enables operationalization of ethical values, improving data integrity, mitigating biases, and enhancing trust. This approach highlights how foundational decisions influence repository quality and AI/ML system usability, addressing provenance, traceability, redundancy, and risk management central to ethical data sourcing.</p><p><strong>Conclusion: </strong>To create authentic, impactful health data repositories that serve public health goals, organizations must prioritize transparency, accountability, and operational frameworks like SCM that comprehensively address the complexities and risks inherent in data stewardship.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf101"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12539179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and evaluation of a patient-centric approach for accurate medication capture. 开发和评估以患者为中心的准确药物捕获方法。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-16 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf118
Larry Ma, Alan V Rincon, Joshua Ide, Katrina O'Hara, Rachel Weinstein, Sebastien Hannay, Lucie Keunen, Vincent Keunen, Ivelina Popova, Sherry Yan

Objective: To develop and evaluate a patient-centric medication module within a personal health record (PHR) app for capturing medication use, focusing on accuracy, usability, and concordance.

Materials and methods: The medication module offered 4 entry methods: picklist, National Drug Code (NDC), free-text, and portal import, with the first 2 leveraging RxNorm and openFDA APIs. Patients from an integrated delivery network (IDN) created medication lists and recorded daily use in the app's diary. Pharmacists evaluated medication accuracy by reviewing patient-uploaded medication images. Usability was measured using the System Usability Scale (SUS). Concordance was assessed by comparing Electronic Health Records (EHR) with diary entries.

Results: Over a 14-day period, 85 patients entered 617 medications, with 533 logged in the diary representing current use. Picklist was the most used entry method. Overall medication entry accuracy was 92% (picklist 97%; NDC 87%; free-text 84%; and portal import 100%). The mean system usability score was 56.5 for the study app (patients) and 80.8 for the medication module (pharmacists). EHR concordance with diary entries was low (25% using the 14-day window; 53% using a 1-year window); most unmatched entries were over-the-counter (OTC) medications.

Discussion: Accurate and complete medication records are essential for the safe and effective use of medications. This patient-centric medication module supported accurate capture of prescription and OTC medications. Gaps in EHR data highlight the need to improve medication record accuracy and reconciliation.

Conclusion: Patient-generated health data can have a central role in creating the "Best Possible Medication History" envisioned by the World Health Organization.

目的:在个人健康记录(PHR)应用程序中开发和评估以患者为中心的用药模块,以捕获药物使用情况,重点关注准确性、可用性和一致性。材料和方法:药物模块提供4种输入方式:picklist、National Drug Code (NDC)、free-text和portal import,前2种使用了RxNorm和openFDA api。来自综合输送网络(IDN)的患者创建了药物清单,并在应用程序的日记中记录了每天的使用情况。药剂师通过查看患者上传的用药图像来评估用药准确性。可用性是使用系统可用性量表(SUS)来测量的。通过比较电子健康记录(EHR)和日记条目来评估一致性。结果:在14天的时间里,85名患者输入了617种药物,其中533种记录在代表当前使用的日记中。Picklist是最常用的输入方法。总体药物录入准确率为92% (picklist 97%; NDC 87%; free-text 84%; portal import 100%)。研究应用程序(患者)的平均系统可用性得分为56.5,用药模块(药剂师)的平均系统可用性得分为80.8。EHR与日记记录的一致性较低(使用14天窗口为25%,使用1年窗口为53%);大多数不匹配条目是非处方(OTC)药物。讨论:准确完整的用药记录对于安全有效地使用药物至关重要。这个以患者为中心的药物模块支持处方和OTC药物的准确捕获。电子病历数据的差距突出了提高用药记录准确性和核对性的必要性。结论:患者产生的健康数据可以在创建世界卫生组织设想的“最佳用药史”方面发挥核心作用。
{"title":"Development and evaluation of a patient-centric approach for accurate medication capture.","authors":"Larry Ma, Alan V Rincon, Joshua Ide, Katrina O'Hara, Rachel Weinstein, Sebastien Hannay, Lucie Keunen, Vincent Keunen, Ivelina Popova, Sherry Yan","doi":"10.1093/jamiaopen/ooaf118","DOIUrl":"10.1093/jamiaopen/ooaf118","url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate a patient-centric medication module within a personal health record (PHR) app for capturing medication use, focusing on accuracy, usability, and concordance.</p><p><strong>Materials and methods: </strong>The medication module offered 4 entry methods: picklist, National Drug Code (NDC), free-text, and portal import, with the first 2 leveraging RxNorm and openFDA APIs. Patients from an integrated delivery network (IDN) created medication lists and recorded daily use in the app's diary. Pharmacists evaluated medication accuracy by reviewing patient-uploaded medication images. Usability was measured using the System Usability Scale (SUS). Concordance was assessed by comparing Electronic Health Records (EHR) with diary entries.</p><p><strong>Results: </strong>Over a 14-day period, 85 patients entered 617 medications, with 533 logged in the diary representing current use. Picklist was the most used entry method. Overall medication entry accuracy was 92% (picklist 97%; NDC 87%; free-text 84%; and portal import 100%). The mean system usability score was 56.5 for the study app (patients) and 80.8 for the medication module (pharmacists). EHR concordance with diary entries was low (25% using the 14-day window; 53% using a 1-year window); most unmatched entries were over-the-counter (OTC) medications.</p><p><strong>Discussion: </strong>Accurate and complete medication records are essential for the safe and effective use of medications. This patient-centric medication module supported accurate capture of prescription and OTC medications. Gaps in EHR data highlight the need to improve medication record accuracy and reconciliation.</p><p><strong>Conclusion: </strong>Patient-generated health data can have a central role in creating the \"Best Possible Medication History\" envisioned by the World Health Organization.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf118"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and implementation of an entity relationship diagram for perinatal data. 围产期数据实体关系图的开发与实现。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-16 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf117
Alison M Stuebe, Randall Blanco, Michael Horvath, Mohammad Golam Kibria, Lauren Kucirka, Karl Shieh, David Page, Metin N Gurcan, William Ed Hammond

Objective: Severe maternal morbidity and mortality are higher in the United States than in other high-income countries, and unacceptable disparities persist. To facilitate research on these outcomes, we developed a standardized approach for extracting perinatal data from electronic health records (EHRs).

Materials and methods: To support data model building and validation, we harmonized perinatal EHR data in a common data model, building on lessons learned from multiple prior projects.

Results: We developed an Entity Relationship Diagram (ERD) that aggregates perinatal EHR data at appropriate granularities (ie, mothers, infants, encounters) with indexing of observations to gestational age and time from delivery. We then developed a standard approach to extract, transform, and load pregnancy-related observations from EHRs for inclusion in PCORnet® Common Data Model tables.

Discussion: Our ERD can facilitate cross-institutional research to identify populations at risk and prompt interventions to improve perinatal outcomes.

Conclusion: A structured approach can accelerate the use of EHR data for perinatal research.

目的:美国严重的孕产妇发病率和死亡率高于其他高收入国家,并且不可接受的差距仍然存在。为了促进对这些结果的研究,我们开发了一种从电子健康记录(EHRs)中提取围产期数据的标准化方法。材料和方法:为了支持数据模型的构建和验证,我们基于从多个先前项目中吸取的经验教训,将围产期电子病历数据统一到一个公共数据模型中。结果:我们开发了一个实体关系图(ERD),以适当的粒度(即母亲,婴儿,遭遇)汇总围产期EHR数据,并索引观察到胎龄和分娩时间。然后,我们开发了一种标准方法,从电子病历中提取、转换和加载与妊娠相关的观察结果,以纳入PCORnet®公共数据模型表。讨论:我们的ERD可以促进跨机构研究,以确定有风险的人群,并及时采取干预措施,以改善围产期结局。结论:一种结构化的方法可以加速EHR数据在围产期研究中的应用。
{"title":"Development and implementation of an entity relationship diagram for perinatal data.","authors":"Alison M Stuebe, Randall Blanco, Michael Horvath, Mohammad Golam Kibria, Lauren Kucirka, Karl Shieh, David Page, Metin N Gurcan, William Ed Hammond","doi":"10.1093/jamiaopen/ooaf117","DOIUrl":"10.1093/jamiaopen/ooaf117","url":null,"abstract":"<p><strong>Objective: </strong>Severe maternal morbidity and mortality are higher in the United States than in other high-income countries, and unacceptable disparities persist. To facilitate research on these outcomes, we developed a standardized approach for extracting perinatal data from electronic health records (EHRs).</p><p><strong>Materials and methods: </strong>To support data model building and validation, we harmonized perinatal EHR data in a common data model, building on lessons learned from multiple prior projects.</p><p><strong>Results: </strong>We developed an Entity Relationship Diagram (ERD) that aggregates perinatal EHR data at appropriate granularities (ie, mothers, infants, encounters) with indexing of observations to gestational age and time from delivery. We then developed a standard approach to extract, transform, and load pregnancy-related observations from EHRs for inclusion in PCORnet<sup>®</sup> Common Data Model tables.</p><p><strong>Discussion: </strong>Our ERD can facilitate cross-institutional research to identify populations at risk and prompt interventions to improve perinatal outcomes.</p><p><strong>Conclusion: </strong>A structured approach can accelerate the use of EHR data for perinatal research.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf117"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ECG-FM: an open electrocardiogram foundation model. 心电图调频:开放式心电图基础模型。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-16 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf122
Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang

Objectives: To develop ECG-FM, an open-weight foundation model for electrocardiogram (ECG) analysis, rigorously evaluate its performance on clinically salient tasks, and openly release it alongside a public benchmark.

Materials and methods: In a study using 1.5 million 12-lead ECGs, we present ECG-FM, a transformer-based foundation model pretrained with hybrid self-supervision that combines masked reconstruction and contrastive learning with ECG-specific augmentation. Downstream, we evaluate multi-label ECG interpretation and prediction of reduced left ventricular ejection fraction (LVEF), introducing an openly available benchmark on the MIMIC-IV-ECG dataset. We assess ECG-FM's capabilities through data scaling experiments, latent-space structure analysis, and attention-based saliency.

Results: Finetuned ECG-FM models outperform task-specific baselines in the small-to-medium-scale data regime, exhibit strong label efficiency and cross-dataset generalizability, and achieve high AUROC on salient labels, including atrial fibrillation (0.996) and LVEF 40 % (0.929). The pretrained encoder showcases competitive linear probing performance, with functionally discriminative embeddings.

Discussion: Findings indicate that ECG-FM is generalizable, label-efficient, and discriminative for screening, risk stratification, and monitoring. Its representations capture low-level morphology and high-order cardiac semantics, and the pretrained encoder serves as a robust feature-set generator. This work mitigates reliance on large labeled datasets, reduces compute and data requirements, and lowers barriers to reproducibility and cross-study comparison.

Conclusion: ECG-FM is an open, rigorously validated ECG foundation model intended to accelerate transparent, comparable research in the ECG analysis subfield. It is designed for rapid integration and evaluation, especially for delivering practical gains in low-label settings. We release our code, model weights, tutorials, and benchmark at https://github.com/bowang-lab/ECG-FM/.

目的:开发ECG- fm,一种用于心电图(ECG)分析的开放权重基础模型,严格评估其在临床重要任务中的表现,并将其与公共基准一起公开发布。材料和方法:在一项使用150万个12导联心电图的研究中,我们提出了ECG-FM,这是一种基于变压器的基础模型,通过混合自我监督进行预训练,将掩蔽重建和对比学习与ecg特异性增强相结合。接下来,我们评估了左室射血分数(LVEF)降低的多标签ECG解释和预测,在MIMIC-IV-ECG数据集上引入了一个公开可用的基准。我们通过数据缩放实验、潜在空间结构分析和基于注意的显著性来评估ECG-FM的能力。结果:微调后的ECG-FM模型在中小规模数据体系中优于任务特定基线,表现出较强的标签效率和跨数据集泛化性,并在显著标签上实现了较高的AUROC,包括心房颤动(0.996)和LVEF≤40%(0.929)。预训练编码器展示了具有功能判别嵌入的竞争性线性探测性能。讨论:研究结果表明,心电图调频在筛查、风险分层和监测方面具有普遍性、标签有效性和区别性。它的表示捕获低级形态学和高阶心脏语义,并且预训练的编码器作为鲁棒特征集生成器。这项工作减轻了对大型标记数据集的依赖,减少了计算和数据需求,降低了可重复性和交叉研究比较的障碍。结论:心电图调频是一个开放的、经过严格验证的心电基础模型,旨在促进心电分析子领域的透明、可比研究。它专为快速集成和评估而设计,特别是在低标签设置中提供实际收益。我们在https://github.com/bowang-lab/ECG-FM/上发布代码、模型权重、教程和基准测试。
{"title":"ECG-FM: an open electrocardiogram foundation model.","authors":"Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang","doi":"10.1093/jamiaopen/ooaf122","DOIUrl":"10.1093/jamiaopen/ooaf122","url":null,"abstract":"<p><strong>Objectives: </strong>To develop ECG-FM, an open-weight foundation model for electrocardiogram (ECG) analysis, rigorously evaluate its performance on clinically salient tasks, and openly release it alongside a public benchmark.</p><p><strong>Materials and methods: </strong>In a study using 1.5 million 12-lead ECGs, we present ECG-FM, a transformer-based foundation model pretrained with hybrid self-supervision that combines masked reconstruction and contrastive learning with ECG-specific augmentation. Downstream, we evaluate multi-label ECG interpretation and prediction of reduced left ventricular ejection fraction (LVEF), introducing an openly available benchmark on the MIMIC-IV-ECG dataset. We assess ECG-FM's capabilities through data scaling experiments, latent-space structure analysis, and attention-based saliency.</p><p><strong>Results: </strong>Finetuned ECG-FM models outperform task-specific baselines in the small-to-medium-scale data regime, exhibit strong label efficiency and cross-dataset generalizability, and achieve high AUROC on salient labels, including atrial fibrillation (0.996) and LVEF <math><mrow><mo>≤</mo> <mn>40</mn> <mi>%</mi></mrow> </math> (0.929). The pretrained encoder showcases competitive linear probing performance, with functionally discriminative embeddings.</p><p><strong>Discussion: </strong>Findings indicate that ECG-FM is generalizable, label-efficient, and discriminative for screening, risk stratification, and monitoring. Its representations capture low-level morphology and high-order cardiac semantics, and the pretrained encoder serves as a robust feature-set generator. This work mitigates reliance on large labeled datasets, reduces compute and data requirements, and lowers barriers to reproducibility and cross-study comparison.</p><p><strong>Conclusion: </strong>ECG-FM is an open, rigorously validated ECG foundation model intended to accelerate transparent, comparable research in the ECG analysis subfield. It is designed for rapid integration and evaluation, especially for delivering practical gains in low-label settings. We release our code, model weights, tutorials, and benchmark at https://github.com/bowang-lab/ECG-FM/.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf122"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes. 高保真参数高效微调联合识别和连接诊断到ICD-10在非标准初级保健笔记。
IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-10-16 eCollection Date: 2025-10-01 DOI: 10.1093/jamiaopen/ooaf120
Cristian Estupiñán-Ojeda, Raúl J Sandomingo-Freire, Lluís Padró, Jordi Turmo

Objectives: Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.

Materials and methods: On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).

Results: FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.

Discussion: The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.

Conclusion: PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.

目的:在双语、非标准西班牙语和加泰罗尼亚语初级保健笔记中诊断的联合识别和ICD-10链接是具有挑战性的。我们评估了参数高效微调(PEFT)技术作为多标签临床文本分类的资源意识替代完全微调(FFT)。材料和方法:在来自加泰罗尼亚的21 812份加泰罗尼亚语和西班牙语临床记录的语料库上,我们比较了PEFT技术LoRA, DoRA, LoHA, LoKR和QLoRA应用于多语言变形器(BERT, RoBERTa, DistilBERT和mDeBERTa)。结果:FFT的严格Micro-F1得分为63.0分,而BERT-QLoRA得分为62.2分,仅低0.8分,可训练参数减少67.5%,记忆减少33.7%。结合双语数据的训练持续提高了跨个别语言的泛化。讨论:小FFT裕度局限于罕见的标签,表明更新所有参数的好处有限。在pet技术中,QLoRA提供了最强的准确性和效率平衡;LoRA和DoRA竞争激烈,而LoHA和LoKR损失更大。适配器等级很重要:等级低于128的Micro-F1急剧下降。大量的内存节省可以在商用gpu上部署,同时提供非常接近FFT的性能。结论:在多语言、低资源的临床环境中,PEFT,特别是QLoRA,支持准确、高效的联合实体识别和ICD-10链接。
{"title":"High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes.","authors":"Cristian Estupiñán-Ojeda, Raúl J Sandomingo-Freire, Lluís Padró, Jordi Turmo","doi":"10.1093/jamiaopen/ooaf120","DOIUrl":"10.1093/jamiaopen/ooaf120","url":null,"abstract":"<p><strong>Objectives: </strong>Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.</p><p><strong>Materials and methods: </strong>On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).</p><p><strong>Results: </strong>FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.</p><p><strong>Discussion: </strong>The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.</p><p><strong>Conclusion: </strong>PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 5","pages":"ooaf120"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JAMIA Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1