首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Interdisciplinary systems may restore the healthcare professional-patient relationship in electronic health systems. 跨学科系统可以在电子卫生系统中恢复医疗保健专业人员与患者的关系。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-17 DOI: 10.1093/jamia/ocaf001
Michael R Cauley, Richard J Boland, S Trent Rosenbloom

Objective: To develop a framework that models the impact of electronic health record (EHR) systems on healthcare professionals' well-being and their relationships with patients, using interdisciplinary insights to guide machine learning in identifying value patterns important to healthcare professionals in EHR systems.

Materials and methods: A theoretical framework of EHR systems' implementation was developed using interdisciplinary literature from healthcare, information systems, and management science focusing on the systems approach, clinical decision-making, and interface terminologies.

Observations: Healthcare professionals balance personal norms of narrative and data-driven communication in knowledge creation for EHRs by integrating detailed patient stories with structured data. This integration forms 2 learning loops that create tension in the healthcare professional-patient relationship, shaping how healthcare professionals apply their values in care delivery. The manifestation of this value tension in EHRs directly affects the well-being of healthcare professionals.

Discussion: Understanding the value tension learning loop between structured data and narrative forms lays the groundwork for future studies of how healthcare professionals use EHRs to deliver care, emphasizing their well-being and patient relationships through a sociotechnical lens.

Conclusion: EHR systems can improve the healthcare professional-patient relationship and healthcare professional well-being by integrating norms and values into pattern recognition of narrative and data communication forms.

目的:开发一个框架,模拟电子健康记录(EHR)系统对医疗保健专业人员的福祉及其与患者的关系的影响,使用跨学科的见解来指导机器学习识别电子健康记录系统中对医疗保健专业人员重要的价值模式。材料和方法:利用来自医疗保健、信息系统和管理科学的跨学科文献,开发了EHR系统实施的理论框架,重点关注系统方法、临床决策和接口术语。观察:医疗保健专业人员通过将详细的患者故事与结构化数据相结合,在电子病历的知识创造中平衡个人叙述规范和数据驱动的沟通。这种整合形成了两个学习循环,在医疗保健专业人员与患者的关系中产生紧张关系,塑造了医疗保健专业人员如何在医疗服务中应用他们的价值观。这种价值张力在电子病历中的表现直接影响到医疗保健专业人员的福祉。讨论:理解结构化数据和叙事形式之间的价值张力学习循环,为医疗保健专业人员如何使用电子病历提供护理的未来研究奠定基础,通过社会技术视角强调他们的福祉和患者关系。结论:电子健康档案系统通过将规范和价值观融入叙事和数据沟通形式的模式识别中,可以改善医患关系和医护人员幸福感。
{"title":"Interdisciplinary systems may restore the healthcare professional-patient relationship in electronic health systems.","authors":"Michael R Cauley, Richard J Boland, S Trent Rosenbloom","doi":"10.1093/jamia/ocaf001","DOIUrl":"10.1093/jamia/ocaf001","url":null,"abstract":"<p><strong>Objective: </strong>To develop a framework that models the impact of electronic health record (EHR) systems on healthcare professionals' well-being and their relationships with patients, using interdisciplinary insights to guide machine learning in identifying value patterns important to healthcare professionals in EHR systems.</p><p><strong>Materials and methods: </strong>A theoretical framework of EHR systems' implementation was developed using interdisciplinary literature from healthcare, information systems, and management science focusing on the systems approach, clinical decision-making, and interface terminologies.</p><p><strong>Observations: </strong>Healthcare professionals balance personal norms of narrative and data-driven communication in knowledge creation for EHRs by integrating detailed patient stories with structured data. This integration forms 2 learning loops that create tension in the healthcare professional-patient relationship, shaping how healthcare professionals apply their values in care delivery. The manifestation of this value tension in EHRs directly affects the well-being of healthcare professionals.</p><p><strong>Discussion: </strong>Understanding the value tension learning loop between structured data and narrative forms lays the groundwork for future studies of how healthcare professionals use EHRs to deliver care, emphasizing their well-being and patient relationships through a sociotechnical lens.</p><p><strong>Conclusion: </strong>EHR systems can improve the healthcare professional-patient relationship and healthcare professional well-being by integrating norms and values into pattern recognition of narrative and data communication forms.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines. 利用检索增强生成改进大型语言模型在生物医学中的应用:系统回顾、荟萃分析和临床开发指南。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-15 DOI: 10.1093/jamia/ocaf008
Siru Liu, Allison B McCoy, Adam Wright

Objective: The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness.

Materials and methods: We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis. Searches were performed in 3 databases (PubMed, Embase, PsycINFO) using terms related to "retrieval augmented generation" and "large language model," for articles published in 2023 and 2024. We selected studies that compared baseline LLM performance with RAG performance. We developed a random-effect meta-analysis model, using odds ratio as the effect size.

Results: Among 335 studies, 20 were included in this literature review. The pooled effect size was 1.35, with a 95% confidence interval of 1.19-1.53, indicating a statistically significant effect (P = .001). We reported clinical tasks, baseline LLMs, retrieval sources and strategies, as well as evaluation methods.

Discussion: Building on our literature review, we developed Guidelines for Unified Implementation and Development of Enhanced LLM Applications with RAG in Clinical Settings to inform clinical applications using RAG.

Conclusion: Overall, RAG implementation showed a 1.35 odds ratio increase in performance compared to baseline LLMs. Future research should focus on (1) system-level enhancement: the combination of RAG and agent, (2) knowledge-level enhancement: deep integration of knowledge into LLM, and (3) integration-level enhancement: integrating RAG systems within electronic health records.

目的:综合近年来检索增强生成(retrieval-augmented generation, RAG)和大型语言模型(large language models, LLMs)在生物医学领域的研究成果,为临床开发提供指导。材料和方法:我们进行了系统的文献综述和荟萃分析。该报告是根据2020年系统评价和荟萃分析的首选报告项目创建的。在3个数据库(PubMed, Embase, PsycINFO)中使用与“检索增强生成”和“大型语言模型”相关的术语对2023年和2024年发表的文章进行了搜索。我们选择了比较基线LLM性能和RAG性能的研究。我们开发了一个随机效应荟萃分析模型,使用优势比作为效应大小。结果:在335项研究中,本文献综述纳入20项。合并效应量为1.35,95%置信区间为1.19 ~ 1.53,具有统计学意义(P = .001)。我们报告了临床任务、基线llm、检索来源和策略以及评估方法。讨论:基于我们的文献综述,我们制定了在临床环境中使用RAG统一实施和开发增强LLM应用的指南,以告知使用RAG的临床应用。结论:总体而言,与基线llm相比,RAG的实施表现出1.35的优势比提高。未来的研究应侧重于(1)系统级增强:RAG与agent的结合;(2)知识级增强:将知识深度集成到LLM中;(3)集成级增强:将RAG系统集成到电子健康记录中。
{"title":"Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines.","authors":"Siru Liu, Allison B McCoy, Adam Wright","doi":"10.1093/jamia/ocaf008","DOIUrl":"https://doi.org/10.1093/jamia/ocaf008","url":null,"abstract":"<p><strong>Objective: </strong>The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness.</p><p><strong>Materials and methods: </strong>We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis. Searches were performed in 3 databases (PubMed, Embase, PsycINFO) using terms related to \"retrieval augmented generation\" and \"large language model,\" for articles published in 2023 and 2024. We selected studies that compared baseline LLM performance with RAG performance. We developed a random-effect meta-analysis model, using odds ratio as the effect size.</p><p><strong>Results: </strong>Among 335 studies, 20 were included in this literature review. The pooled effect size was 1.35, with a 95% confidence interval of 1.19-1.53, indicating a statistically significant effect (P = .001). We reported clinical tasks, baseline LLMs, retrieval sources and strategies, as well as evaluation methods.</p><p><strong>Discussion: </strong>Building on our literature review, we developed Guidelines for Unified Implementation and Development of Enhanced LLM Applications with RAG in Clinical Settings to inform clinical applications using RAG.</p><p><strong>Conclusion: </strong>Overall, RAG implementation showed a 1.35 odds ratio increase in performance compared to baseline LLMs. Future research should focus on (1) system-level enhancement: the combination of RAG and agent, (2) knowledge-level enhancement: deep integration of knowledge into LLM, and (3) integration-level enhancement: integrating RAG systems within electronic health records.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An examination of ambulatory care code specificity utilization in ICD-10-CM compared to ICD-9-CM: implications for ICD-11 implementation. 与ICD-9-CM相比,ICD-10-CM中门诊护理代码特异性使用的检查:对ICD-11实施的影响。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-11 DOI: 10.1093/jamia/ocaf003
Susan H Fenton, Cassandra Ciminello, Vickie M Mays, Mary H Stanfill, Valerie Watzlaf

Objective: The ICD-10-CM classification system contains more specificity than its predecessor ICD-9-CM. A stated reason for transitioning to ICD-10-CM was to increase the availability of detailed data. This study aims to determine whether the increased specificity contained in ICD-10-CM is utilized in the ambulatory care setting and inform an evidence-based approach to evaluate ICD-11 content for implementation planning in the United States.

Materials and methods: Diagnosis codes and text descriptions were extracted from a 25% random sample of the IQVIA Ambulatory EMR-US database for 2014 (ICD-9-CM, n = 14 327 155) and 2019 (ICD-10-CM, n = 13 062 900). Code utilization data was analyzed for the total and unique number of codes. Frequencies and tests of significance determined the percentage of available codes utilized and the unspecified code rates for both code sets in each year.

Results: Only 44.6% of available ICD-10-CM codes were used compared to 91.5% of available ICD-9-CM codes. Of the total codes used, 14.5% ICD-9-CM codes were unspecified, while 33.3% ICD-10-CM codes were unspecified.

Discussion: Even though greater detail is available, a 108.5% increase in using unspecified codes with ICD-10-CM was found. The utilization data analyzed in this study does not support a rationale for the large increase in the number of codes in ICD-10-CM. New technologies and methods are likely needed to fully utilize detailed classification systems.

Conclusion: These results help evaluate the content needed in the United States national ICD standard. This analysis of codes in the current ICD standard is important for ICD-11 evaluation, implementation, and use.

目的:ICD-10-CM分类系统比其前身ICD-9-CM更具特异性。向ICD-10-CM过渡的一个明确原因是增加详细数据的可用性。本研究旨在确定ICD-10-CM中增加的特异性是否用于门诊护理环境,并为美国实施计划评估ICD-11内容的循证方法提供信息。材料和方法:从IQVIA动态EMR-US数据库2014年(ICD-9-CM, n = 14 327 155)和2019年(ICD-10-CM, n = 13 062 900)的25%随机样本中提取诊断代码和文本描述。对代码使用数据进行了分析,以确定代码的总数和唯一数量。频率和显著性测试决定了每年使用的可用代码的百分比和两个代码集的未指定代码率。结果:ICD-10-CM编码的使用率为44.6%,而ICD-9-CM编码的使用率为91.5%。在使用的全部编码中,14.5%的ICD-9-CM编码未明确,33.3%的ICD-10-CM编码未明确。讨论:尽管有更多的细节,但发现使用未指定代码的ICD-10-CM增加了108.5%。本研究分析的利用数据不支持ICD-10-CM中代码数量大量增加的基本原理。可能需要新的技术和方法来充分利用详细的分类系统。结论:这些结果有助于评估美国ICD国家标准所需的内容。对现行ICD标准中代码的分析对于ICD-11的评估、实施和使用非常重要。
{"title":"An examination of ambulatory care code specificity utilization in ICD-10-CM compared to ICD-9-CM: implications for ICD-11 implementation.","authors":"Susan H Fenton, Cassandra Ciminello, Vickie M Mays, Mary H Stanfill, Valerie Watzlaf","doi":"10.1093/jamia/ocaf003","DOIUrl":"https://doi.org/10.1093/jamia/ocaf003","url":null,"abstract":"<p><strong>Objective: </strong>The ICD-10-CM classification system contains more specificity than its predecessor ICD-9-CM. A stated reason for transitioning to ICD-10-CM was to increase the availability of detailed data. This study aims to determine whether the increased specificity contained in ICD-10-CM is utilized in the ambulatory care setting and inform an evidence-based approach to evaluate ICD-11 content for implementation planning in the United States.</p><p><strong>Materials and methods: </strong>Diagnosis codes and text descriptions were extracted from a 25% random sample of the IQVIA Ambulatory EMR-US database for 2014 (ICD-9-CM, n = 14 327 155) and 2019 (ICD-10-CM, n = 13 062 900). Code utilization data was analyzed for the total and unique number of codes. Frequencies and tests of significance determined the percentage of available codes utilized and the unspecified code rates for both code sets in each year.</p><p><strong>Results: </strong>Only 44.6% of available ICD-10-CM codes were used compared to 91.5% of available ICD-9-CM codes. Of the total codes used, 14.5% ICD-9-CM codes were unspecified, while 33.3% ICD-10-CM codes were unspecified.</p><p><strong>Discussion: </strong>Even though greater detail is available, a 108.5% increase in using unspecified codes with ICD-10-CM was found. The utilization data analyzed in this study does not support a rationale for the large increase in the number of codes in ICD-10-CM. New technologies and methods are likely needed to fully utilize detailed classification systems.</p><p><strong>Conclusion: </strong>These results help evaluate the content needed in the United States national ICD standard. This analysis of codes in the current ICD standard is important for ICD-11 evaluation, implementation, and use.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Imitator: Learning from Imperfect Clinical Decisions. 聪明的模仿者:从不完美的临床决策中学习。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-10 DOI: 10.1093/jamia/ocae320
Dilruk Perera, Siqi Liu, Kay Choong See, Mengling Feng

Objectives: This study introduces Smart Imitator (SI), a 2-phase reinforcement learning (RL) solution enhancing personalized treatment policies in healthcare, addressing challenges from imperfect clinician data and complex environments.

Materials and methods: Smart Imitator's first phase uses adversarial cooperative imitation learning with a novel sample selection schema to categorize clinician policies from optimal to nonoptimal. The second phase creates a parameterized reward function to guide the learning of superior treatment policies through RL. Smart Imitator's effectiveness was validated on 2 datasets: a sepsis dataset with 19 711 patient trajectories and a diabetes dataset with 7234 trajectories.

Results: Extensive quantitative and qualitative experiments showed that SI significantly outperformed state-of-the-art baselines in both datasets. For sepsis, SI reduced estimated mortality rates by 19.6% compared to the best baseline. For diabetes, SI reduced HbA1c-High rates by 12.2%. The learned policies aligned closely with successful clinical decisions and deviated strategically when necessary. These deviations aligned with recent clinical findings, suggesting improved outcomes.

Discussion: Smart Imitator advances RL applications by addressing challenges such as imperfect data and environmental complexities, demonstrating effectiveness within the tested conditions of sepsis and diabetes. Further validation across diverse conditions and exploration of additional RL algorithms are needed to enhance precision and generalizability.

Conclusion: This study shows potential in advancing personalized healthcare learning from clinician behaviors to improve treatment outcomes. Its methodology offers a robust approach for adaptive, personalized strategies in various complex and uncertain environments.

目的:本研究介绍了智能模仿者(SI),这是一种两阶段强化学习(RL)解决方案,可增强医疗保健中的个性化治疗政策,解决临床医生数据不完善和复杂环境带来的挑战。材料和方法:智能模仿者的第一阶段使用对抗性合作模仿学习和一种新的样本选择模式,将临床医生的策略从最优到非最优进行分类。第二阶段创建一个参数化的奖励函数,通过强化学习来指导更好的待遇政策的学习。Smart Imitator的有效性在2个数据集上得到了验证:脓毒症数据集(包含19711个患者轨迹)和糖尿病数据集(包含7234个轨迹)。结果:广泛的定量和定性实验表明,SI在两个数据集中都明显优于最先进的基线。对于败血症,与最佳基线相比,SI降低了19.6%的估计死亡率。对于糖尿病,SI使HbA1c-High率降低了12.2%。所学到的政策与成功的临床决策密切相关,必要时也会在战略上有所偏离。这些偏差与最近的临床发现一致,表明预后改善。讨论:智能模仿者通过解决数据不完善和环境复杂性等挑战来推进RL应用,并在败血症和糖尿病的测试条件下展示有效性。需要在不同条件下进一步验证和探索额外的强化学习算法,以提高精度和泛化性。结论:本研究显示了从临床医生行为中学习个性化医疗保健以改善治疗结果的潜力。它的方法为在各种复杂和不确定的环境中自适应、个性化的策略提供了一个强大的方法。
{"title":"Smart Imitator: Learning from Imperfect Clinical Decisions.","authors":"Dilruk Perera, Siqi Liu, Kay Choong See, Mengling Feng","doi":"10.1093/jamia/ocae320","DOIUrl":"https://doi.org/10.1093/jamia/ocae320","url":null,"abstract":"<p><strong>Objectives: </strong>This study introduces Smart Imitator (SI), a 2-phase reinforcement learning (RL) solution enhancing personalized treatment policies in healthcare, addressing challenges from imperfect clinician data and complex environments.</p><p><strong>Materials and methods: </strong>Smart Imitator's first phase uses adversarial cooperative imitation learning with a novel sample selection schema to categorize clinician policies from optimal to nonoptimal. The second phase creates a parameterized reward function to guide the learning of superior treatment policies through RL. Smart Imitator's effectiveness was validated on 2 datasets: a sepsis dataset with 19 711 patient trajectories and a diabetes dataset with 7234 trajectories.</p><p><strong>Results: </strong>Extensive quantitative and qualitative experiments showed that SI significantly outperformed state-of-the-art baselines in both datasets. For sepsis, SI reduced estimated mortality rates by 19.6% compared to the best baseline. For diabetes, SI reduced HbA1c-High rates by 12.2%. The learned policies aligned closely with successful clinical decisions and deviated strategically when necessary. These deviations aligned with recent clinical findings, suggesting improved outcomes.</p><p><strong>Discussion: </strong>Smart Imitator advances RL applications by addressing challenges such as imperfect data and environmental complexities, demonstrating effectiveness within the tested conditions of sepsis and diabetes. Further validation across diverse conditions and exploration of additional RL algorithms are needed to enhance precision and generalizability.</p><p><strong>Conclusion: </strong>This study shows potential in advancing personalized healthcare learning from clinician behaviors to improve treatment outcomes. Its methodology offers a robust approach for adaptive, personalized strategies in various complex and uncertain environments.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142962554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking national primary care electronic health records to individual records from the U.S. Census Bureau's American Community Survey: evaluating the likelihood of linkage based on patient health. 将全国初级保健电子健康记录与美国人口普查局美国社区调查的个人记录相链接:根据患者健康状况评估链接的可能性。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae269
Aubrey Limburg, Nicole Gladish, David H Rehkopf, Robert L Phillips, Victoria Udalova

Objectives: To evaluate the likelihood of linking electronic health records (EHRs) to restricted individual-level American Community Survey (ACS) data based on patient health condition.

Materials and methods: Electronic health records (2019-2021) are derived from a primary care registry collected by the American Board of Family Medicine. These data were assigned anonymized person-level identifiers (Protected Identification Keys [PIKs]) at the U.S. Census Bureau. These records were then linked to restricted individual-level data from the ACS (2005-2022). We used logistic regressions to evaluate match rates for patients with health conditions across a range of severity: hypertension, diabetes, and chronic kidney disease.

Results: Among more than 2.8 million patients, 99.2% were assigned person-level identifiers (PIKs). There were some differences in the odds of receiving an identifier in adjusted models for patients with hypertension (OR = 1.70, 95% CI: 1.63, 1.77) and diabetes (OR = 1.17, 95% CI: 1.13, 1.22), relative to those without. There were only small differences in the odds of matching to ACS in adjusted models for patients with hypertension (OR = 1.03, 95% CI: 1.03, 1.04), diabetes (OR = 1.02, 95% CI: 1.01, 1.03), and chronic kidney disease (OR = 1.05, 95% CI: 1.03, 1.06), relative to those without.

Discussion and conclusion: Our work supports evidence-building across government consistent with the Foundations for Evidence-Based Policymaking Act of 2018 and the goal of leveraging data as a strategic asset. Given the high PIK and ACS match rates, with small differences based on health condition, our findings suggest the feasibility of enhancing the utility of EHR data for research focused on health.

目的评估根据患者健康状况将电子健康记录(EHR)与受限的个人层面美国社区调查(ACS)数据联系起来的可能性:电子健康记录(2019-2021 年)来自美国家庭医学委员会收集的初级保健登记。美国人口普查局为这些数据分配了匿名的个人级标识符(受保护的识别码 [PIK])。然后将这些记录与 ACS(2005-2022 年)中受限的个人级别数据进行链接。我们使用逻辑回归评估了高血压、糖尿病和慢性肾病等不同严重程度健康状况患者的匹配率:在 280 多万名患者中,99.2% 的患者被分配了个人级标识符 (PIK)。在调整后的模型中,高血压患者(OR = 1.70,95% CI:1.63, 1.77)和糖尿病患者(OR = 1.17,95% CI:1.13, 1.22)获得标识符的几率与未获得标识符的患者存在一定差异。在调整模型中,高血压(OR = 1.03,95% CI:1.03,1.04)、糖尿病(OR = 1.02,95% CI:1.01,1.03)和慢性肾病(OR = 1.05,95% CI:1.03,1.06)患者与无高血压、糖尿病和慢性肾病患者的 ACS 匹配几率仅有微小差异:我们的工作支持整个政府的循证建设,符合《2018 年循证决策基础法案》以及将数据作为战略资产加以利用的目标。鉴于 PIK 和 ACS 的匹配率很高,而且基于健康状况的差异很小,我们的研究结果表明,提高电子病历数据在以健康为重点的研究中的实用性是可行的。
{"title":"Linking national primary care electronic health records to individual records from the U.S. Census Bureau's American Community Survey: evaluating the likelihood of linkage based on patient health.","authors":"Aubrey Limburg, Nicole Gladish, David H Rehkopf, Robert L Phillips, Victoria Udalova","doi":"10.1093/jamia/ocae269","DOIUrl":"10.1093/jamia/ocae269","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the likelihood of linking electronic health records (EHRs) to restricted individual-level American Community Survey (ACS) data based on patient health condition.</p><p><strong>Materials and methods: </strong>Electronic health records (2019-2021) are derived from a primary care registry collected by the American Board of Family Medicine. These data were assigned anonymized person-level identifiers (Protected Identification Keys [PIKs]) at the U.S. Census Bureau. These records were then linked to restricted individual-level data from the ACS (2005-2022). We used logistic regressions to evaluate match rates for patients with health conditions across a range of severity: hypertension, diabetes, and chronic kidney disease.</p><p><strong>Results: </strong>Among more than 2.8 million patients, 99.2% were assigned person-level identifiers (PIKs). There were some differences in the odds of receiving an identifier in adjusted models for patients with hypertension (OR = 1.70, 95% CI: 1.63, 1.77) and diabetes (OR = 1.17, 95% CI: 1.13, 1.22), relative to those without. There were only small differences in the odds of matching to ACS in adjusted models for patients with hypertension (OR = 1.03, 95% CI: 1.03, 1.04), diabetes (OR = 1.02, 95% CI: 1.01, 1.03), and chronic kidney disease (OR = 1.05, 95% CI: 1.03, 1.06), relative to those without.</p><p><strong>Discussion and conclusion: </strong>Our work supports evidence-building across government consistent with the Foundations for Evidence-Based Policymaking Act of 2018 and the goal of leveraging data as a strategic asset. Given the high PIK and ACS match rates, with small differences based on health condition, our findings suggest the feasibility of enhancing the utility of EHR data for research focused on health.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"97-104"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records. 比较六种自然语言处理方法,以评估退伍军人健康管理局电子健康记录中的枪支使用情况。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae169
Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti

Objective: Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.

Materials and methods: We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as "definite access", "definitely no access", or "other".

Results: Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as "other". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.

Discussion and conclusion: Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.

目的:接触枪支与自杀风险增加有关。我们的目的是开发一种自然语言处理方法来描述临床记录中的枪支使用情况:我们使用了 2023 年 4 月 10 日至 2024 年 4 月 10 日期间 36 685 名退伍军人健康管理局(VHA)患者的临床记录。我们利用主题专家扩充了已有的枪支术语集,并围绕笔记中出现的每个枪支术语生成了 250 个字符的片段。注释者将 3000 个片段分为三类。利用这些标注片段,我们比较了四种非神经机器学习模型(随机森林、bagging、梯度提升、带山脊惩罚的逻辑回归)和两个版本的双向编码器表征转换器(Bidirectional Encoder Representations from Transformers,简称 BERT)(特别是 BioBERT 和 Bio-ClinicalBERT),以将枪支接触分为 "肯定接触"、"肯定不接触 "或 "其他":在 36 685 份病历(41.3%)中识别出了枪支术语,33.7% 的片段被归类为明确接触枪支,9.0% 被归类为明确不接触枪支,57.2% 被归类为 "其他"。在对使用枪支进行分类的模型中,六个模型中有五个的性能可以接受,其中 BioBERT 和 Bio-ClinicalBERT 的性能最好,F1 分别为 0.876(95% 置信区间,0.874-0.879)和 0.896(95% 置信区间,0.894-0.899):在退伍军人事务部患者的临床记录中,与枪支有关的术语很常见。利用文本识别和描述患者使用枪支情况的能力可以加强自杀预防工作,我们的六个模型中有五个模型可用于识别患者以进行临床干预。
{"title":"Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.","authors":"Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti","doi":"10.1093/jamia/ocae169","DOIUrl":"10.1093/jamia/ocae169","url":null,"abstract":"<p><strong>Objective: </strong>Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.</p><p><strong>Materials and methods: </strong>We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as \"definite access\", \"definitely no access\", or \"other\".</p><p><strong>Results: </strong>Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as \"other\". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.</p><p><strong>Discussion and conclusion: </strong>Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"113-118"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features. 利用结构化和叙事性电子健康记录特征对阿尔茨海默病患者进行迷你精神状态检查表型分析。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae274
Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng

Objective: This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.

Materials and methods: We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).

Results: We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.

Discussion: This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.

Conclusion: We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.

研究目的本研究旨在使用自然语言处理(NLP)和机器学习(ML)对结构化和非结构化电子病历数据自动预测小型精神状态检查(MMSE)评分,这是阿尔茨海默病患者认知评估中广泛采用的标准:我们从电子病历中提取了人口统计学数据、诊断、药物和非结构化临床就诊记录。我们使用 Latent Dirichlet Allocation (LDA) 进行主题建模,使用 Term-Frequency Inverse Document Frequency (TF-IDF) 进行 n-grams 建模。此外,我们还提取了年龄、民族和种族等元特征。模型的训练和评估采用了极梯度提升(XGBoost)、随机梯度下降回归器(SGDRegressor)和多层感知器(MLP):我们分析了 2019 年 9 月至 2023 年 6 月期间收集的 1654 份临床就诊记录,涉及 1000 名阿尔茨海默病患者。平均 MMSE 得分为 20 分,患者平均年龄为 76.4 岁,54.7% 为女性,54.7% 为白人。表现最好的模型(即均方根误差(RMSE)最小)是 MLP,该模型使用 n-grams,在验证集上的 RMSE 为 5.53,表明其预测性能优于其他模型和特征集。测试集上的 RMSE 为 5.85:本研究开发了一种从非结构化临床笔记中预测 MMSE 分数的 ML 方法,证明了利用 NLP 支持认知评估的可行性。今后的工作重点是完善模型,并评估其在不同环境下的临床相关性:我们利用电子病历特征建立了一个 MMSE 自动估算模型,有可能改变对阿尔茨海默病患者的认知评估,为更明智的临床决策和队列识别铺平道路。
{"title":"Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features.","authors":"Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng","doi":"10.1093/jamia/ocae274","DOIUrl":"10.1093/jamia/ocae274","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.</p><p><strong>Materials and methods: </strong>We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).</p><p><strong>Results: </strong>We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.</p><p><strong>Discussion: </strong>This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.</p><p><strong>Conclusion: </strong>We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"119-128"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language model uncertainty proxies: discrimination and calibration for medical diagnosis and treatment. 大语言模型不确定性代理:医学诊断和治疗的鉴别与校准。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae254
Thomas Savage, John Wang, Robert Gallo, Abdessalem Boukil, Vishwesh Patel, Seyed Amir Ahmad Safavi-Naini, Ali Soroush, Jonathan H Chen

Introduction: The inability of large language models (LLMs) to communicate uncertainty is a significant barrier to their use in medicine. Before LLMs can be integrated into patient care, the field must assess methods to estimate uncertainty in ways that are useful to physician-users.

Objective: Evaluate the ability for uncertainty proxies to quantify LLM confidence when performing diagnosis and treatment selection tasks by assessing the properties of discrimination and calibration.

Methods: We examined confidence elicitation (CE), token-level probability (TLP), and sample consistency (SC) proxies across GPT3.5, GPT4, Llama2, and Llama3. Uncertainty proxies were evaluated against 3 datasets of open-ended patient scenarios.

Results: SC discrimination outperformed TLP and CE methods. SC by sentence embedding achieved the highest discriminative performance (ROC AUC 0.68-0.79), yet with poor calibration. SC by GPT annotation achieved the second-best discrimination (ROC AUC 0.66-0.74) with accurate calibration. Verbalized confidence (CE) was found to consistently overestimate model confidence.

Discussion and conclusions: SC is the most effective method for estimating LLM uncertainty of the proxies evaluated. SC by sentence embedding can effectively estimate uncertainty if the user has a set of reference cases with which to re-calibrate their results, while SC by GPT annotation is the more effective method if the user does not have reference cases and requires accurate raw calibration. Our results confirm LLMs are consistently over-confident when verbalizing their confidence (CE).

简介大型语言模型(LLMs)无法传达不确定性是其应用于医学的一大障碍。在将 LLM 纳入病人护理之前,该领域必须评估以对医生用户有用的方式估计不确定性的方法:目标:通过评估辨别和校准特性,评估不确定性代理在执行诊断和治疗选择任务时量化 LLM 置信度的能力:我们检查了 GPT3.5、GPT4、Llama2 和 Llama3 中的置信度激发 (CE)、标记级概率 (TLP) 和样本一致性 (SC) 代理。根据 3 个开放式患者情景数据集对不确定性代理进行了评估:SC 辨识能力优于 TLP 和 CE 方法。通过句子嵌入的 SC 分辨性能最高(ROC AUC 0.68-0.79),但校准效果不佳。通过 GPT 注释的 SC 分辨性能次之(ROC AUC 0.66-0.74),校准准确。讨论与结论:SC 是估算所评估代用指标的 LLM 不确定性的最有效方法。如果用户有一组可用于重新校准其结果的参考案例,那么通过句子嵌入进行 SC 可以有效地估计不确定性,而如果用户没有参考案例并需要精确的原始校准,那么通过 GPT 注释进行 SC 则是更有效的方法。我们的结果证实,LLMs 在口头表达其置信度 (CE) 时总是过于自信。
{"title":"Large language model uncertainty proxies: discrimination and calibration for medical diagnosis and treatment.","authors":"Thomas Savage, John Wang, Robert Gallo, Abdessalem Boukil, Vishwesh Patel, Seyed Amir Ahmad Safavi-Naini, Ali Soroush, Jonathan H Chen","doi":"10.1093/jamia/ocae254","DOIUrl":"10.1093/jamia/ocae254","url":null,"abstract":"<p><strong>Introduction: </strong>The inability of large language models (LLMs) to communicate uncertainty is a significant barrier to their use in medicine. Before LLMs can be integrated into patient care, the field must assess methods to estimate uncertainty in ways that are useful to physician-users.</p><p><strong>Objective: </strong>Evaluate the ability for uncertainty proxies to quantify LLM confidence when performing diagnosis and treatment selection tasks by assessing the properties of discrimination and calibration.</p><p><strong>Methods: </strong>We examined confidence elicitation (CE), token-level probability (TLP), and sample consistency (SC) proxies across GPT3.5, GPT4, Llama2, and Llama3. Uncertainty proxies were evaluated against 3 datasets of open-ended patient scenarios.</p><p><strong>Results: </strong>SC discrimination outperformed TLP and CE methods. SC by sentence embedding achieved the highest discriminative performance (ROC AUC 0.68-0.79), yet with poor calibration. SC by GPT annotation achieved the second-best discrimination (ROC AUC 0.66-0.74) with accurate calibration. Verbalized confidence (CE) was found to consistently overestimate model confidence.</p><p><strong>Discussion and conclusions: </strong>SC is the most effective method for estimating LLM uncertainty of the proxies evaluated. SC by sentence embedding can effectively estimate uncertainty if the user has a set of reference cases with which to re-calibrate their results, while SC by GPT annotation is the more effective method if the user does not have reference cases and requires accurate raw calibration. Our results confirm LLMs are consistently over-confident when verbalizing their confidence (CE).</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"139-149"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of a digital quality measure for cancer diagnosis in Epic Cosmos. 在 Epic Cosmos 中应用癌症诊断数字质量标准。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae253
Andrew J Zimolzak, Sundas P Khan, Hardeep Singh, Jessica A Davila

Objectives: Missed and delayed cancer diagnoses are common, harmful, and often preventable. We previously validated a digital quality measure (dQM) of emergency presentation (EP) of lung cancer in 2 US health systems. This study aimed to apply the dQM to a new national electronic health record (EHR) database and examine demographic associations.

Materials and methods: We applied the dQM (emergency encounter followed by new lung cancer diagnosis within 30 days) to Epic Cosmos, a deidentified database covering 184 million US patients. We examined dQM associations with sociodemographic factors.

Results: The overall EP rate was 19.6%. EP rate was higher in Black vs White patients (24% vs 19%, P < .001) and patients with younger age, higher social vulnerability, lower-income ZIP code, and self-reported transport difficulties.

Discussion: We successfully applied a dQM based on cancer EP to the largest US EHR database.

Conclusion: This dQM could be a marker for sociodemographic vulnerabilities in cancer diagnosis.

目标:癌症漏诊和延误诊断是常见的、有害的,而且往往是可以预防的。我们曾在美国的两个医疗系统中验证了肺癌急诊(EP)的数字质量测量(dQM)。本研究旨在将 dQM 应用于一个新的全国电子健康记录(EHR)数据库,并研究人口统计学关联:我们将 dQM(急诊后 30 天内新诊断出肺癌)应用于 Epic Cosmos,这是一个涵盖 1.84 亿美国患者的去身份化数据库。我们研究了 dQM 与社会人口因素的关系:结果:总体 EP 率为 19.6%。黑人患者的 EP 率高于白人患者(24% 对 19%,P 讨论):我们在美国最大的电子病历数据库中成功应用了基于癌症 EP 的 dQM:结论:该 dQM 可以作为癌症诊断中社会人口脆弱性的标记。
{"title":"Application of a digital quality measure for cancer diagnosis in Epic Cosmos.","authors":"Andrew J Zimolzak, Sundas P Khan, Hardeep Singh, Jessica A Davila","doi":"10.1093/jamia/ocae253","DOIUrl":"10.1093/jamia/ocae253","url":null,"abstract":"<p><strong>Objectives: </strong>Missed and delayed cancer diagnoses are common, harmful, and often preventable. We previously validated a digital quality measure (dQM) of emergency presentation (EP) of lung cancer in 2 US health systems. This study aimed to apply the dQM to a new national electronic health record (EHR) database and examine demographic associations.</p><p><strong>Materials and methods: </strong>We applied the dQM (emergency encounter followed by new lung cancer diagnosis within 30 days) to Epic Cosmos, a deidentified database covering 184 million US patients. We examined dQM associations with sociodemographic factors.</p><p><strong>Results: </strong>The overall EP rate was 19.6%. EP rate was higher in Black vs White patients (24% vs 19%, P < .001) and patients with younger age, higher social vulnerability, lower-income ZIP code, and self-reported transport difficulties.</p><p><strong>Discussion: </strong>We successfully applied a dQM based on cancer EP to the largest US EHR database.</p><p><strong>Conclusion: </strong>This dQM could be a marker for sociodemographic vulnerabilities in cancer diagnosis.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"227-229"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning health system linchpins: information exchange and a common data model. 学习卫生系统的关键:信息交换和通用数据模型。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae277
Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar

Objective: To demonstrate the potential for a centrally managed health information exchange standardized to a common data model (HIE-CDM) to facilitate semantic data flow needed to support a learning health system (LHS).

Materials and methods: The Rhode Island Quality Institute operates the Rhode Island (RI) statewide HIE, which aggregates RI health data for more than half of the state's population from 47 data partners. We standardized HIE data to the Observational Medical Outcomes Partnership (OMOP) CDM. Atherosclerotic cardiovascular disease (ASCVD) risk and primary prevention practices were selected to demonstrate LHS semantic data flow from 2013 to 2023.

Results: We calculated longitudinal 10-year ASCVD risk on 62,999 individuals. Nearly two-thirds had ASCVD risk factors from more than one data partner. This enabled granular tracking of individual ASCVD risk, primary prevention (ie, statin therapy), and incident disease. The population was on statins for fewer than half of the guideline-recommended days. We also found that individuals receiving care at Federally Qualified Health Centers were more likely to have unfavorable ASCVD risk profiles and more likely to be on statins. CDM transformation reduced data heterogeneity through a unified health record that adheres to defined terminologies per OMOP domain.

Discussion: We demonstrated the potential for an HIE-CDM to enable observational population health research. We also showed how to leverage existing health information technology infrastructure and health data best practices to break down LHS barriers.

Conclusion: HIE-CDM facilitates knowledge curation and health system intervention development at the individual, health system, and population levels.

目的展示以通用数据模型(HIE-CDM)为标准的集中管理式医疗信息交换的潜力,以促进支持学习型医疗系统(LHS)所需的语义数据流:罗德岛质量研究所运营着罗德岛(RI)全州范围的 HIE,该 HIE 从 47 个数据合作伙伴处汇集了罗德岛半数以上人口的健康数据。我们将 HIE 数据标准化为观察性医疗结果合作组织 (OMOP) CDM。我们选择了动脉粥样硬化性心血管疾病(ASCVD)风险和一级预防实践,以展示从 2013 年到 2023 年的 LHS 语义数据流:我们计算了 62999 人的 10 年纵向 ASCVD 风险。近三分之二的人的 ASCVD 风险因素来自一个以上的数据合作伙伴。这样就可以对个人的 ASCVD 风险、一级预防(即他汀类药物治疗)和突发疾病进行细粒度跟踪。该人群使用他汀类药物的天数不到指南推荐天数的一半。我们还发现,在联邦合格医疗中心接受治疗的人更有可能具有不利的 ASCVD 风险特征,也更有可能服用他汀类药物。CDM 转换通过统一的健康记录减少了数据的异质性,该健康记录遵循每个 OMOP 领域的定义术语:我们展示了 HIE-CDM 在开展人口健康观察研究方面的潜力。我们还展示了如何利用现有的健康信息技术基础设施和健康数据最佳实践来打破 LHS 的障碍:HIE-CDM有助于在个人、卫生系统和人口层面进行知识整理和卫生系统干预开发。
{"title":"Learning health system linchpins: information exchange and a common data model.","authors":"Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar","doi":"10.1093/jamia/ocae277","DOIUrl":"10.1093/jamia/ocae277","url":null,"abstract":"<p><strong>Objective: </strong>To demonstrate the potential for a centrally managed health information exchange standardized to a common data model (HIE-CDM) to facilitate semantic data flow needed to support a learning health system (LHS).</p><p><strong>Materials and methods: </strong>The Rhode Island Quality Institute operates the Rhode Island (RI) statewide HIE, which aggregates RI health data for more than half of the state's population from 47 data partners. We standardized HIE data to the Observational Medical Outcomes Partnership (OMOP) CDM. Atherosclerotic cardiovascular disease (ASCVD) risk and primary prevention practices were selected to demonstrate LHS semantic data flow from 2013 to 2023.</p><p><strong>Results: </strong>We calculated longitudinal 10-year ASCVD risk on 62,999 individuals. Nearly two-thirds had ASCVD risk factors from more than one data partner. This enabled granular tracking of individual ASCVD risk, primary prevention (ie, statin therapy), and incident disease. The population was on statins for fewer than half of the guideline-recommended days. We also found that individuals receiving care at Federally Qualified Health Centers were more likely to have unfavorable ASCVD risk profiles and more likely to be on statins. CDM transformation reduced data heterogeneity through a unified health record that adheres to defined terminologies per OMOP domain.</p><p><strong>Discussion: </strong>We demonstrated the potential for an HIE-CDM to enable observational population health research. We also showed how to leverage existing health information technology infrastructure and health data best practices to break down LHS barriers.</p><p><strong>Conclusion: </strong>HIE-CDM facilitates knowledge curation and health system intervention development at the individual, health system, and population levels.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"9-19"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1