首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements. 基于膳食补充剂的大语言模型的检索增强多任务信息提取。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-11 DOI: 10.1093/jamia/ocaf002
Zaifu Zhan, Shuang Zhou, Mingchen Li, Rui Zhang

Objective: To develop an advanced multi-task large language model (LLM) framework for extracting diverse types of information about dietary supplements (DSs) from clinical records.

Methods: We focused on 4 core DS information extraction tasks: named entity recognition (2 949 clinical sentences), relation extraction (4 892 sentences), triple extraction (2 949 sentences), and usage classification (2 460 sentences). To address these tasks, we introduced the retrieval-augmented multi-task information extraction (RAMIE) framework, which incorporates: (1) instruction fine-tuning with task-specific prompts; (2) multi-task training of LLMs to enhance storage efficiency and reduce training costs; and (3) retrieval-augmented generation, which retrieves similar examples from the training set to improve task performance. We compared the performance of RAMIE to LLMs with instruction fine-tuning alone and conducted an ablation study to evaluate the individual contributions of multi-task learning and retrieval-augmented generation to overall performance improvements.

Results: Using the RAMIE framework, Llama2-13B achieved an F1 score of 87.39 on the named entity recognition task, reflecting a 3.51% improvement. It also excelled in the relation extraction task with an F1 score of 93.74, a 1.15% improvement. For the triple extraction task, Llama2-7B achieved an F1 score of 79.45, representing a significant 14.26% improvement. MedAlpaca-7B delivered the highest F1 score of 93.45 on the usage classification task, with a 0.94% improvement. The ablation study highlighted that while multi-task learning improved efficiency with a minor trade-off in performance, the inclusion of retrieval-augmented generation significantly enhanced overall accuracy across tasks.

Conclusion: The RAMIE framework demonstrates substantial improvements in multi-task information extraction for DS-related data from clinical records.

目的:开发一种先进的多任务大语言模型(LLM)框架,用于从临床记录中提取不同类型的膳食补充剂信息。方法:重点研究命名实体识别(临床句2 949个)、关系提取(临床句4 892个)、三联体提取(临床句2 949个)和用法分类(临床句2 460个)4个核心信息提取任务。为了解决这些问题,我们引入了检索增强多任务信息提取(RAMIE)框架,该框架包含:(1)使用特定任务提示进行指令微调;(2)对法学硕士进行多任务培训,提高仓储效率,降低培训成本;(3)检索增强生成,从训练集中检索相似的样例以提高任务性能。我们比较了RAMIE与单独进行指令微调的llm的表现,并进行了一项烧蚀研究,以评估多任务学习和检索增强生成对整体表现改善的个人贡献。结果:在RAMIE框架下,Llama2-13B在命名实体识别任务上获得了87.39分的F1分,提高了3.51%。在关系提取任务中也表现出色,F1得分为93.74,提高了1.15%。对于三重提取任务,Llama2-7B获得了79.45分的F1分,提高了14.26%。MedAlpaca-7B在使用分类任务上F1得分最高,为93.45分,提高了0.94%。消融研究强调,虽然多任务学习提高了效率,但在性能上有轻微的牺牲,但包含检索增强生成显着提高了跨任务的总体准确性。结论:RAMIE框架在从临床记录中提取ds相关数据的多任务信息方面有了实质性的改进。
{"title":"RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.","authors":"Zaifu Zhan, Shuang Zhou, Mingchen Li, Rui Zhang","doi":"10.1093/jamia/ocaf002","DOIUrl":"https://doi.org/10.1093/jamia/ocaf002","url":null,"abstract":"<p><strong>Objective: </strong>To develop an advanced multi-task large language model (LLM) framework for extracting diverse types of information about dietary supplements (DSs) from clinical records.</p><p><strong>Methods: </strong>We focused on 4 core DS information extraction tasks: named entity recognition (2 949 clinical sentences), relation extraction (4 892 sentences), triple extraction (2 949 sentences), and usage classification (2 460 sentences). To address these tasks, we introduced the retrieval-augmented multi-task information extraction (RAMIE) framework, which incorporates: (1) instruction fine-tuning with task-specific prompts; (2) multi-task training of LLMs to enhance storage efficiency and reduce training costs; and (3) retrieval-augmented generation, which retrieves similar examples from the training set to improve task performance. We compared the performance of RAMIE to LLMs with instruction fine-tuning alone and conducted an ablation study to evaluate the individual contributions of multi-task learning and retrieval-augmented generation to overall performance improvements.</p><p><strong>Results: </strong>Using the RAMIE framework, Llama2-13B achieved an F1 score of 87.39 on the named entity recognition task, reflecting a 3.51% improvement. It also excelled in the relation extraction task with an F1 score of 93.74, a 1.15% improvement. For the triple extraction task, Llama2-7B achieved an F1 score of 79.45, representing a significant 14.26% improvement. MedAlpaca-7B delivered the highest F1 score of 93.45 on the usage classification task, with a 0.94% improvement. The ablation study highlighted that while multi-task learning improved efficiency with a minor trade-off in performance, the inclusion of retrieval-augmented generation significantly enhanced overall accuracy across tasks.</p><p><strong>Conclusion: </strong>The RAMIE framework demonstrates substantial improvements in multi-task information extraction for DS-related data from clinical records.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An examination of ambulatory care code specificity utilization in ICD-10-CM compared to ICD-9-CM: implications for ICD-11 implementation. 与ICD-9-CM相比,ICD-10-CM中门诊护理代码特异性使用的检查:对ICD-11实施的影响。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-11 DOI: 10.1093/jamia/ocaf003
Susan H Fenton, Cassandra Ciminello, Vickie M Mays, Mary H Stanfill, Valerie Watzlaf

Objective: The ICD-10-CM classification system contains more specificity than its predecessor ICD-9-CM. A stated reason for transitioning to ICD-10-CM was to increase the availability of detailed data. This study aims to determine whether the increased specificity contained in ICD-10-CM is utilized in the ambulatory care setting and inform an evidence-based approach to evaluate ICD-11 content for implementation planning in the United States.

Materials and methods: Diagnosis codes and text descriptions were extracted from a 25% random sample of the IQVIA Ambulatory EMR-US database for 2014 (ICD-9-CM, n = 14 327 155) and 2019 (ICD-10-CM, n = 13 062 900). Code utilization data was analyzed for the total and unique number of codes. Frequencies and tests of significance determined the percentage of available codes utilized and the unspecified code rates for both code sets in each year.

Results: Only 44.6% of available ICD-10-CM codes were used compared to 91.5% of available ICD-9-CM codes. Of the total codes used, 14.5% ICD-9-CM codes were unspecified, while 33.3% ICD-10-CM codes were unspecified.

Discussion: Even though greater detail is available, a 108.5% increase in using unspecified codes with ICD-10-CM was found. The utilization data analyzed in this study does not support a rationale for the large increase in the number of codes in ICD-10-CM. New technologies and methods are likely needed to fully utilize detailed classification systems.

Conclusion: These results help evaluate the content needed in the United States national ICD standard. This analysis of codes in the current ICD standard is important for ICD-11 evaluation, implementation, and use.

目的:ICD-10-CM分类系统比其前身ICD-9-CM更具特异性。向ICD-10-CM过渡的一个明确原因是增加详细数据的可用性。本研究旨在确定ICD-10-CM中增加的特异性是否用于门诊护理环境,并为美国实施计划评估ICD-11内容的循证方法提供信息。材料和方法:从IQVIA动态EMR-US数据库2014年(ICD-9-CM, n = 14 327 155)和2019年(ICD-10-CM, n = 13 062 900)的25%随机样本中提取诊断代码和文本描述。对代码使用数据进行了分析,以确定代码的总数和唯一数量。频率和显著性测试决定了每年使用的可用代码的百分比和两个代码集的未指定代码率。结果:ICD-10-CM编码的使用率为44.6%,而ICD-9-CM编码的使用率为91.5%。在使用的全部编码中,14.5%的ICD-9-CM编码未明确,33.3%的ICD-10-CM编码未明确。讨论:尽管有更多的细节,但发现使用未指定代码的ICD-10-CM增加了108.5%。本研究分析的利用数据不支持ICD-10-CM中代码数量大量增加的基本原理。可能需要新的技术和方法来充分利用详细的分类系统。结论:这些结果有助于评估美国ICD国家标准所需的内容。对现行ICD标准中代码的分析对于ICD-11的评估、实施和使用非常重要。
{"title":"An examination of ambulatory care code specificity utilization in ICD-10-CM compared to ICD-9-CM: implications for ICD-11 implementation.","authors":"Susan H Fenton, Cassandra Ciminello, Vickie M Mays, Mary H Stanfill, Valerie Watzlaf","doi":"10.1093/jamia/ocaf003","DOIUrl":"https://doi.org/10.1093/jamia/ocaf003","url":null,"abstract":"<p><strong>Objective: </strong>The ICD-10-CM classification system contains more specificity than its predecessor ICD-9-CM. A stated reason for transitioning to ICD-10-CM was to increase the availability of detailed data. This study aims to determine whether the increased specificity contained in ICD-10-CM is utilized in the ambulatory care setting and inform an evidence-based approach to evaluate ICD-11 content for implementation planning in the United States.</p><p><strong>Materials and methods: </strong>Diagnosis codes and text descriptions were extracted from a 25% random sample of the IQVIA Ambulatory EMR-US database for 2014 (ICD-9-CM, n = 14 327 155) and 2019 (ICD-10-CM, n = 13 062 900). Code utilization data was analyzed for the total and unique number of codes. Frequencies and tests of significance determined the percentage of available codes utilized and the unspecified code rates for both code sets in each year.</p><p><strong>Results: </strong>Only 44.6% of available ICD-10-CM codes were used compared to 91.5% of available ICD-9-CM codes. Of the total codes used, 14.5% ICD-9-CM codes were unspecified, while 33.3% ICD-10-CM codes were unspecified.</p><p><strong>Discussion: </strong>Even though greater detail is available, a 108.5% increase in using unspecified codes with ICD-10-CM was found. The utilization data analyzed in this study does not support a rationale for the large increase in the number of codes in ICD-10-CM. New technologies and methods are likely needed to fully utilize detailed classification systems.</p><p><strong>Conclusion: </strong>These results help evaluate the content needed in the United States national ICD standard. This analysis of codes in the current ICD standard is important for ICD-11 evaluation, implementation, and use.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Objective study validity diagnostics: a framework requiring pre-specified, empirical verification to increase trust in the reliability of real-world evidence. 客观研究有效性诊断:一个框架,需要预先指定,经验验证,以增加对真实世界证据可靠性的信任。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-10 DOI: 10.1093/jamia/ocae317
Mitchell M Conover, Patrick B Ryan, Yong Chen, Marc A Suchard, George Hripcsak, Martijn J Schuemie

Objective: Propose a framework to empirically evaluate and report validity of findings from observational studies using pre-specified objective diagnostics, increasing trust in real-world evidence (RWE).

Materials and methods: The framework employs objective diagnostic measures to assess the appropriateness of study designs, analytic assumptions, and threats to validity in generating reliable evidence addressing causal questions. Diagnostic evaluations should be interpreted before the unblinding of study results or, alternatively, only unblind results from analyses that pass pre-specified thresholds. We provide a conceptual overview of objective diagnostic measures and demonstrate their impact on the validity of RWE from a large-scale comparative new-user study of various antihypertensive medications. We evaluated expected absolute systematic error (EASE) before and after applying diagnostic thresholds, using a large set of negative control outcomes.

Results: Applying objective diagnostics reduces bias and improves evidence reliability in observational studies. Among 11 716 analyses (EASE = 0.38), 13.9% met pre-specified diagnostic thresholds which reduced EASE to zero. Objective diagnostics provide a comprehensive and empirical set of tests that increase confidence when passed and raise doubts when failed.

Discussion: The increasing use of real-world data presents a scientific opportunity; however, the complexity of the evidence generation process poses challenges for understanding study validity and trusting RWE. Deploying objective diagnostics is crucial to reducing bias and improving reliability in RWE generation. Under ideal conditions, multiple study designs pass diagnostics and generate consistent results, deepening understanding of causal relationships. Open-source, standardized programs can facilitate implementation of diagnostic analyses.

Conclusion: Objective diagnostics are a valuable addition to the RWE generation process.

目的:提出一个框架,使用预先指定的客观诊断对观察性研究结果进行实证评估和报告有效性,增加对真实世界证据的信任(RWE)。材料和方法:该框架采用客观的诊断措施来评估研究设计、分析假设的适当性,以及在产生解决因果问题的可靠证据时对有效性的威胁。诊断评估应该在研究结果解盲之前进行解释,或者,只有通过预先指定的阈值的分析才能解盲。我们提供了客观诊断措施的概念概述,并通过各种抗高血压药物的大规模比较新用户研究证明了它们对RWE有效性的影响。我们使用大量的阴性对照结果,在应用诊断阈值之前和之后评估预期绝对系统误差(EASE)。结果:在观察性研究中,应用客观诊断可以减少偏倚,提高证据的可靠性。在11716个分析(EASE = 0.38)中,13.9%符合预先设定的诊断阈值,将EASE降至零。客观诊断提供了一套全面和经验性的测试,通过时增加信心,失败时引起怀疑。讨论:越来越多地使用真实世界的数据提供了一个科学机会;然而,证据生成过程的复杂性对理解研究有效性和信任RWE提出了挑战。部署客观诊断对于减少偏倚和提高RWE发电的可靠性至关重要。在理想条件下,多个研究设计通过诊断并产生一致的结果,加深对因果关系的理解。开源、标准化的程序可以促进诊断分析的实现。结论:客观诊断是RWE产生过程中有价值的补充。
{"title":"Objective study validity diagnostics: a framework requiring pre-specified, empirical verification to increase trust in the reliability of real-world evidence.","authors":"Mitchell M Conover, Patrick B Ryan, Yong Chen, Marc A Suchard, George Hripcsak, Martijn J Schuemie","doi":"10.1093/jamia/ocae317","DOIUrl":"https://doi.org/10.1093/jamia/ocae317","url":null,"abstract":"<p><strong>Objective: </strong>Propose a framework to empirically evaluate and report validity of findings from observational studies using pre-specified objective diagnostics, increasing trust in real-world evidence (RWE).</p><p><strong>Materials and methods: </strong>The framework employs objective diagnostic measures to assess the appropriateness of study designs, analytic assumptions, and threats to validity in generating reliable evidence addressing causal questions. Diagnostic evaluations should be interpreted before the unblinding of study results or, alternatively, only unblind results from analyses that pass pre-specified thresholds. We provide a conceptual overview of objective diagnostic measures and demonstrate their impact on the validity of RWE from a large-scale comparative new-user study of various antihypertensive medications. We evaluated expected absolute systematic error (EASE) before and after applying diagnostic thresholds, using a large set of negative control outcomes.</p><p><strong>Results: </strong>Applying objective diagnostics reduces bias and improves evidence reliability in observational studies. Among 11 716 analyses (EASE = 0.38), 13.9% met pre-specified diagnostic thresholds which reduced EASE to zero. Objective diagnostics provide a comprehensive and empirical set of tests that increase confidence when passed and raise doubts when failed.</p><p><strong>Discussion: </strong>The increasing use of real-world data presents a scientific opportunity; however, the complexity of the evidence generation process poses challenges for understanding study validity and trusting RWE. Deploying objective diagnostics is crucial to reducing bias and improving reliability in RWE generation. Under ideal conditions, multiple study designs pass diagnostics and generate consistent results, deepening understanding of causal relationships. Open-source, standardized programs can facilitate implementation of diagnostic analyses.</p><p><strong>Conclusion: </strong>Objective diagnostics are a valuable addition to the RWE generation process.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Imitator: Learning from Imperfect Clinical Decisions. 聪明的模仿者:从不完美的临床决策中学习。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-10 DOI: 10.1093/jamia/ocae320
Dilruk Perera, Siqi Liu, Kay Choong See, Mengling Feng

Objectives: This study introduces Smart Imitator (SI), a 2-phase reinforcement learning (RL) solution enhancing personalized treatment policies in healthcare, addressing challenges from imperfect clinician data and complex environments.

Materials and methods: Smart Imitator's first phase uses adversarial cooperative imitation learning with a novel sample selection schema to categorize clinician policies from optimal to nonoptimal. The second phase creates a parameterized reward function to guide the learning of superior treatment policies through RL. Smart Imitator's effectiveness was validated on 2 datasets: a sepsis dataset with 19 711 patient trajectories and a diabetes dataset with 7234 trajectories.

Results: Extensive quantitative and qualitative experiments showed that SI significantly outperformed state-of-the-art baselines in both datasets. For sepsis, SI reduced estimated mortality rates by 19.6% compared to the best baseline. For diabetes, SI reduced HbA1c-High rates by 12.2%. The learned policies aligned closely with successful clinical decisions and deviated strategically when necessary. These deviations aligned with recent clinical findings, suggesting improved outcomes.

Discussion: Smart Imitator advances RL applications by addressing challenges such as imperfect data and environmental complexities, demonstrating effectiveness within the tested conditions of sepsis and diabetes. Further validation across diverse conditions and exploration of additional RL algorithms are needed to enhance precision and generalizability.

Conclusion: This study shows potential in advancing personalized healthcare learning from clinician behaviors to improve treatment outcomes. Its methodology offers a robust approach for adaptive, personalized strategies in various complex and uncertain environments.

目的:本研究介绍了智能模仿者(SI),这是一种两阶段强化学习(RL)解决方案,可增强医疗保健中的个性化治疗政策,解决临床医生数据不完善和复杂环境带来的挑战。材料和方法:智能模仿者的第一阶段使用对抗性合作模仿学习和一种新的样本选择模式,将临床医生的策略从最优到非最优进行分类。第二阶段创建一个参数化的奖励函数,通过强化学习来指导更好的待遇政策的学习。Smart Imitator的有效性在2个数据集上得到了验证:脓毒症数据集(包含19711个患者轨迹)和糖尿病数据集(包含7234个轨迹)。结果:广泛的定量和定性实验表明,SI在两个数据集中都明显优于最先进的基线。对于败血症,与最佳基线相比,SI降低了19.6%的估计死亡率。对于糖尿病,SI使HbA1c-High率降低了12.2%。所学到的政策与成功的临床决策密切相关,必要时也会在战略上有所偏离。这些偏差与最近的临床发现一致,表明预后改善。讨论:智能模仿者通过解决数据不完善和环境复杂性等挑战来推进RL应用,并在败血症和糖尿病的测试条件下展示有效性。需要在不同条件下进一步验证和探索额外的强化学习算法,以提高精度和泛化性。结论:本研究显示了从临床医生行为中学习个性化医疗保健以改善治疗结果的潜力。它的方法为在各种复杂和不确定的环境中自适应、个性化的策略提供了一个强大的方法。
{"title":"Smart Imitator: Learning from Imperfect Clinical Decisions.","authors":"Dilruk Perera, Siqi Liu, Kay Choong See, Mengling Feng","doi":"10.1093/jamia/ocae320","DOIUrl":"https://doi.org/10.1093/jamia/ocae320","url":null,"abstract":"<p><strong>Objectives: </strong>This study introduces Smart Imitator (SI), a 2-phase reinforcement learning (RL) solution enhancing personalized treatment policies in healthcare, addressing challenges from imperfect clinician data and complex environments.</p><p><strong>Materials and methods: </strong>Smart Imitator's first phase uses adversarial cooperative imitation learning with a novel sample selection schema to categorize clinician policies from optimal to nonoptimal. The second phase creates a parameterized reward function to guide the learning of superior treatment policies through RL. Smart Imitator's effectiveness was validated on 2 datasets: a sepsis dataset with 19 711 patient trajectories and a diabetes dataset with 7234 trajectories.</p><p><strong>Results: </strong>Extensive quantitative and qualitative experiments showed that SI significantly outperformed state-of-the-art baselines in both datasets. For sepsis, SI reduced estimated mortality rates by 19.6% compared to the best baseline. For diabetes, SI reduced HbA1c-High rates by 12.2%. The learned policies aligned closely with successful clinical decisions and deviated strategically when necessary. These deviations aligned with recent clinical findings, suggesting improved outcomes.</p><p><strong>Discussion: </strong>Smart Imitator advances RL applications by addressing challenges such as imperfect data and environmental complexities, demonstrating effectiveness within the tested conditions of sepsis and diabetes. Further validation across diverse conditions and exploration of additional RL algorithms are needed to enhance precision and generalizability.</p><p><strong>Conclusion: </strong>This study shows potential in advancing personalized healthcare learning from clinician behaviors to improve treatment outcomes. Its methodology offers a robust approach for adaptive, personalized strategies in various complex and uncertain environments.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142962554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utility of word embeddings from large language models in medical diagnosis. 大型语言模型的词嵌入在医学诊断中的应用。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-09 DOI: 10.1093/jamia/ocae314
Shahram Yazdani, Ronald Claude Henry, Avery Byrne, Isaac Claude Henry

Objective: This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").

Materials and methods: Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space. Euclidean and Chebyshev distance metrics were used to classify symptoms based on their proximity to both the eponymic condition and the ensemble mean of the condition's symptoms.

Results: The ensemble mean approach showed significantly higher classification accuracy, correctly classifying between 80% (Cowden disease) to 100% (Tuberous sclerosis) of the sample disease symptoms using the Euclidean distance metric. In contrast, the eponymic condition approach using Euclidian distance metric and Chebyshev distances, in general, showed poor symptom classification performance, with erratic results (0%-100% accuracy), largely ranging between 0% and 3% accuracy.

Discussion: The ensemble mean captures a disease's collective symptom profile, providing a more nuanced representation than the disease name alone. However, some misclassifications were due to superficial semantic similarities, highlighting the need for LLM models trained on medical corpora.

Conclusion: The ensemble mean of symptom embeddings improves classification accuracy over the eponymic condition approach. Future efforts should focus on medical-specific training of LLMs to enhance their diagnostic accuracy and clinical utility.

目的:本研究通过比较症状与其同名疾病嵌入(“同名条件”)的语义接近度和与疾病相关的所有症状嵌入的均值(“集合均值”),评估由大型语言模型(llm)生成的词嵌入在医学诊断中的效用。材料和方法:从PubMed中收集5种诊断上具有挑战性的儿科疾病(charge综合征、考登病、POEMS综合征、风湿热和结节性硬化症)的症状资料。使用Ada-002嵌入模型,将疾病名称和症状转换为高维空间中的向量表示。欧几里得和切比雪夫距离指标被用来根据它们与同名病症和病症症状的总体平均值的接近程度对症状进行分类。结果:集合平均方法显示出更高的分类准确率,使用欧几里得距离度量的样本疾病症状的正确率在80%(考登病)到100%(结节性硬化症)之间。相比之下,使用欧几里得距离度量和切比雪夫距离的同名条件方法通常表现出较差的症状分类性能,结果不稳定(准确率为0%-100%),准确率大多在0%- 3%之间。讨论:集合平均值捕获疾病的集体症状概况,提供比单独的疾病名称更细致入微的表示。然而,一些错误分类是由于表面的语义相似性,这突出了对医学语料库训练的LLM模型的需求。结论:症状嵌入的集合均值比同名条件法的分类准确率更高。未来的工作应侧重于对法学硕士进行专门的医学培训,以提高他们的诊断准确性和临床实用性。
{"title":"Utility of word embeddings from large language models in medical diagnosis.","authors":"Shahram Yazdani, Ronald Claude Henry, Avery Byrne, Isaac Claude Henry","doi":"10.1093/jamia/ocae314","DOIUrl":"https://doi.org/10.1093/jamia/ocae314","url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding (\"eponymic condition\") and the mean of all symptom embeddings associated with a disease (\"ensemble mean\").</p><p><strong>Materials and methods: </strong>Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space. Euclidean and Chebyshev distance metrics were used to classify symptoms based on their proximity to both the eponymic condition and the ensemble mean of the condition's symptoms.</p><p><strong>Results: </strong>The ensemble mean approach showed significantly higher classification accuracy, correctly classifying between 80% (Cowden disease) to 100% (Tuberous sclerosis) of the sample disease symptoms using the Euclidean distance metric. In contrast, the eponymic condition approach using Euclidian distance metric and Chebyshev distances, in general, showed poor symptom classification performance, with erratic results (0%-100% accuracy), largely ranging between 0% and 3% accuracy.</p><p><strong>Discussion: </strong>The ensemble mean captures a disease's collective symptom profile, providing a more nuanced representation than the disease name alone. However, some misclassifications were due to superficial semantic similarities, highlighting the need for LLM models trained on medical corpora.</p><p><strong>Conclusion: </strong>The ensemble mean of symptom embeddings improves classification accuracy over the eponymic condition approach. Future efforts should focus on medical-specific training of LLMs to enhance their diagnostic accuracy and clinical utility.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI as an intervention: improving clinical outcomes relies on a causal approach to AI development and validation. 人工智能干预:改善临床结果依赖于人工智能开发和验证的因果方法。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-07 DOI: 10.1093/jamia/ocae301
Shalmali Joshi, Iñigo Urteaga, Wouter A C van Amsterdam, George Hripcsak, Pierre Elias, Benjamin Recht, Noémie Elhadad, James Fackler, Mark P Sendak, Jenna Wiens, Kaivalya Deshpande, Yoav Wald, Madalina Fiterau, Zachary Lipton, Daniel Malinsky, Madhur Nayan, Hongseok Namkoong, Soojin Park, Julia E Vogt, Rajesh Ranganath

The primary practice of healthcare artificial intelligence (AI) starts with model development, often using state-of-the-art AI, retrospectively evaluated using metrics lifted from the AI literature like AUROC and DICE score. However, good performance on these metrics may not translate to improved clinical outcomes. Instead, we argue for a better development pipeline constructed by working backward from the end goal of positively impacting clinically relevant outcomes using AI, leading to considerations of causality in model development and validation, and subsequently a better development pipeline. Healthcare AI should be "actionable," and the change in actions induced by AI should improve outcomes. Quantifying the effect of changes in actions on outcomes is causal inference. The development, evaluation, and validation of healthcare AI should therefore account for the causal effect of intervening with the AI on clinically relevant outcomes. Using a causal lens, we make recommendations for key stakeholders at various stages of the healthcare AI pipeline. Our recommendations aim to increase the positive impact of AI on clinical outcomes.

医疗人工智能(AI)的主要实践始于模型开发,通常使用最先进的人工智能,并使用AUROC和DICE评分等人工智能文献中的指标进行回顾性评估。然而,这些指标的良好表现可能不会转化为改善的临床结果。相反,我们主张通过使用人工智能对临床相关结果产生积极影响的最终目标向后工作,从而构建更好的开发管道,从而考虑模型开发和验证中的因果关系,从而构建更好的开发管道。医疗人工智能应该是“可操作的”,人工智能引起的行动变化应该改善结果。量化行为变化对结果的影响是因果推理。因此,医疗人工智能的开发、评估和验证应该考虑人工智能干预对临床相关结果的因果影响。从因果关系的角度来看,我们为医疗保健人工智能管道各个阶段的关键利益相关者提出了建议。我们的建议旨在增加人工智能对临床结果的积极影响。
{"title":"AI as an intervention: improving clinical outcomes relies on a causal approach to AI development and validation.","authors":"Shalmali Joshi, Iñigo Urteaga, Wouter A C van Amsterdam, George Hripcsak, Pierre Elias, Benjamin Recht, Noémie Elhadad, James Fackler, Mark P Sendak, Jenna Wiens, Kaivalya Deshpande, Yoav Wald, Madalina Fiterau, Zachary Lipton, Daniel Malinsky, Madhur Nayan, Hongseok Namkoong, Soojin Park, Julia E Vogt, Rajesh Ranganath","doi":"10.1093/jamia/ocae301","DOIUrl":"https://doi.org/10.1093/jamia/ocae301","url":null,"abstract":"<p><p>The primary practice of healthcare artificial intelligence (AI) starts with model development, often using state-of-the-art AI, retrospectively evaluated using metrics lifted from the AI literature like AUROC and DICE score. However, good performance on these metrics may not translate to improved clinical outcomes. Instead, we argue for a better development pipeline constructed by working backward from the end goal of positively impacting clinically relevant outcomes using AI, leading to considerations of causality in model development and validation, and subsequently a better development pipeline. Healthcare AI should be \"actionable,\" and the change in actions induced by AI should improve outcomes. Quantifying the effect of changes in actions on outcomes is causal inference. The development, evaluation, and validation of healthcare AI should therefore account for the causal effect of intervening with the AI on clinically relevant outcomes. Using a causal lens, we make recommendations for key stakeholders at various stages of the healthcare AI pipeline. Our recommendations aim to increase the positive impact of AI on clinical outcomes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The substance-exposed birthing person-infant/child dyad and health information exchange in the United States. 物质暴露在美国的分娩人-婴儿/儿童与健康信息交流。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-02 DOI: 10.1093/jamia/ocae315
Fabienne C Bourgeois, Amrita Sinha, Gaurav Tuli, Marvin B Harper, Virginia K Robbins, Sydney Jeffrey, John S Brownstein, Shahla M Jilani

Objective: Timely access to data is needed to improve care for substance-exposed birthing persons and their infants, a significant public health problem in the United States. We examined the current state of birthing person and infant/child (dyad) data-sharing capabilities supported by health information exchange (HIE) standards and HIE network capabilities for data exchange to inform point-of-care needs assessment for the substance-exposed dyad.

Material and methods: A cross-map analysis was performed using a set of dyadic data elements focused on pediatric development and longitudinal supportive care for substance-exposed dyads (70 birthing person and 110 infant/child elements). Cross-mapping was conducted to identify definitional alignment to standardized data fields within national healthcare data exchange standards, the United States Core Data for Interoperability (USCDI) version 4 (v4) and Fast Healthcare Interoperability Resources (FHIR) release 4 (R4), and applicable structured vocabulary standards or terminology associated with USCDI. Subsequent survey analysis examined representative HIE network sharing capabilities, focusing on USCDI and FHIR usage.

Results: 91.11% of dyadic data elements cross-mapped to at least 1 USCDI v4 standardized data field (87.80% of those structured) and 88.89% to FHIR R4. 75% of the surveyed HIE networks reported supporting USCDI versions 1 or 2 and the capability to use FHIR, though demand is limited.

Discussion: HIE of clinical and supportive care data for substance-exposed dyads is supported by current national standards, though limitations exist.

Conclusion: These findings offer a dyadic-focused framework for electronic health record-centered data exchange to inform bedside care longitudinally across clinical touchpoints and population-level health.

目的:需要及时获取数据,以改善对接触物质的分娩人员及其婴儿的护理,这是美国的一个重大公共卫生问题。我们检查了由健康信息交换(HIE)标准和HIE网络功能支持的分娩人员和婴儿/儿童(二人组)数据共享能力的当前状态,以通知对物质暴露的二人组的护理点需求评估。材料和方法:使用一组双元数据元素进行交叉图分析,重点关注儿童发育和物质暴露双元的纵向支持护理(70名分娩人员和110名婴儿/儿童元素)。进行交叉映射以确定与国家医疗保健数据交换标准、美国核心互操作性数据(USCDI)第4版(v4)和快速医疗保健互操作性资源(FHIR)第4版(R4)以及与USCDI相关的适用结构化词汇表标准或术语中的标准化数据字段的定义一致性。随后的调查分析检查了具有代表性的HIE网络共享能力,重点关注USCDI和FHIR的使用情况。结果:91.11%的二元数据元素交叉映射到至少1个USCDI v4标准化数据字段(占结构化数据字段的87.80%),88.89%的二元数据元素交叉映射到FHIR R4。尽管需求有限,但75%的受访HIE网络报告支持USCDI版本1或2,并具有使用FHIR的能力。讨论:虽然存在局限性,但目前的国家标准支持物质暴露双体的临床和支持性护理数据的HIE。结论:这些发现为以电子健康记录为中心的数据交换提供了一个以动态为中心的框架,可以跨临床接触点和人群健康水平纵向地告知床边护理。
{"title":"The substance-exposed birthing person-infant/child dyad and health information exchange in the United States.","authors":"Fabienne C Bourgeois, Amrita Sinha, Gaurav Tuli, Marvin B Harper, Virginia K Robbins, Sydney Jeffrey, John S Brownstein, Shahla M Jilani","doi":"10.1093/jamia/ocae315","DOIUrl":"https://doi.org/10.1093/jamia/ocae315","url":null,"abstract":"<p><strong>Objective: </strong>Timely access to data is needed to improve care for substance-exposed birthing persons and their infants, a significant public health problem in the United States. We examined the current state of birthing person and infant/child (dyad) data-sharing capabilities supported by health information exchange (HIE) standards and HIE network capabilities for data exchange to inform point-of-care needs assessment for the substance-exposed dyad.</p><p><strong>Material and methods: </strong>A cross-map analysis was performed using a set of dyadic data elements focused on pediatric development and longitudinal supportive care for substance-exposed dyads (70 birthing person and 110 infant/child elements). Cross-mapping was conducted to identify definitional alignment to standardized data fields within national healthcare data exchange standards, the United States Core Data for Interoperability (USCDI) version 4 (v4) and Fast Healthcare Interoperability Resources (FHIR) release 4 (R4), and applicable structured vocabulary standards or terminology associated with USCDI. Subsequent survey analysis examined representative HIE network sharing capabilities, focusing on USCDI and FHIR usage.</p><p><strong>Results: </strong>91.11% of dyadic data elements cross-mapped to at least 1 USCDI v4 standardized data field (87.80% of those structured) and 88.89% to FHIR R4. 75% of the surveyed HIE networks reported supporting USCDI versions 1 or 2 and the capability to use FHIR, though demand is limited.</p><p><strong>Discussion: </strong>HIE of clinical and supportive care data for substance-exposed dyads is supported by current national standards, though limitations exist.</p><p><strong>Conclusion: </strong>These findings offer a dyadic-focused framework for electronic health record-centered data exchange to inform bedside care longitudinally across clinical touchpoints and population-level health.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142923823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking national primary care electronic health records to individual records from the U.S. Census Bureau's American Community Survey: evaluating the likelihood of linkage based on patient health. 将全国初级保健电子健康记录与美国人口普查局美国社区调查的个人记录相链接:根据患者健康状况评估链接的可能性。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae269
Aubrey Limburg, Nicole Gladish, David H Rehkopf, Robert L Phillips, Victoria Udalova

Objectives: To evaluate the likelihood of linking electronic health records (EHRs) to restricted individual-level American Community Survey (ACS) data based on patient health condition.

Materials and methods: Electronic health records (2019-2021) are derived from a primary care registry collected by the American Board of Family Medicine. These data were assigned anonymized person-level identifiers (Protected Identification Keys [PIKs]) at the U.S. Census Bureau. These records were then linked to restricted individual-level data from the ACS (2005-2022). We used logistic regressions to evaluate match rates for patients with health conditions across a range of severity: hypertension, diabetes, and chronic kidney disease.

Results: Among more than 2.8 million patients, 99.2% were assigned person-level identifiers (PIKs). There were some differences in the odds of receiving an identifier in adjusted models for patients with hypertension (OR = 1.70, 95% CI: 1.63, 1.77) and diabetes (OR = 1.17, 95% CI: 1.13, 1.22), relative to those without. There were only small differences in the odds of matching to ACS in adjusted models for patients with hypertension (OR = 1.03, 95% CI: 1.03, 1.04), diabetes (OR = 1.02, 95% CI: 1.01, 1.03), and chronic kidney disease (OR = 1.05, 95% CI: 1.03, 1.06), relative to those without.

Discussion and conclusion: Our work supports evidence-building across government consistent with the Foundations for Evidence-Based Policymaking Act of 2018 and the goal of leveraging data as a strategic asset. Given the high PIK and ACS match rates, with small differences based on health condition, our findings suggest the feasibility of enhancing the utility of EHR data for research focused on health.

目的评估根据患者健康状况将电子健康记录(EHR)与受限的个人层面美国社区调查(ACS)数据联系起来的可能性:电子健康记录(2019-2021 年)来自美国家庭医学委员会收集的初级保健登记。美国人口普查局为这些数据分配了匿名的个人级标识符(受保护的识别码 [PIK])。然后将这些记录与 ACS(2005-2022 年)中受限的个人级别数据进行链接。我们使用逻辑回归评估了高血压、糖尿病和慢性肾病等不同严重程度健康状况患者的匹配率:在 280 多万名患者中,99.2% 的患者被分配了个人级标识符 (PIK)。在调整后的模型中,高血压患者(OR = 1.70,95% CI:1.63, 1.77)和糖尿病患者(OR = 1.17,95% CI:1.13, 1.22)获得标识符的几率与未获得标识符的患者存在一定差异。在调整模型中,高血压(OR = 1.03,95% CI:1.03,1.04)、糖尿病(OR = 1.02,95% CI:1.01,1.03)和慢性肾病(OR = 1.05,95% CI:1.03,1.06)患者与无高血压、糖尿病和慢性肾病患者的 ACS 匹配几率仅有微小差异:我们的工作支持整个政府的循证建设,符合《2018 年循证决策基础法案》以及将数据作为战略资产加以利用的目标。鉴于 PIK 和 ACS 的匹配率很高,而且基于健康状况的差异很小,我们的研究结果表明,提高电子病历数据在以健康为重点的研究中的实用性是可行的。
{"title":"Linking national primary care electronic health records to individual records from the U.S. Census Bureau's American Community Survey: evaluating the likelihood of linkage based on patient health.","authors":"Aubrey Limburg, Nicole Gladish, David H Rehkopf, Robert L Phillips, Victoria Udalova","doi":"10.1093/jamia/ocae269","DOIUrl":"10.1093/jamia/ocae269","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the likelihood of linking electronic health records (EHRs) to restricted individual-level American Community Survey (ACS) data based on patient health condition.</p><p><strong>Materials and methods: </strong>Electronic health records (2019-2021) are derived from a primary care registry collected by the American Board of Family Medicine. These data were assigned anonymized person-level identifiers (Protected Identification Keys [PIKs]) at the U.S. Census Bureau. These records were then linked to restricted individual-level data from the ACS (2005-2022). We used logistic regressions to evaluate match rates for patients with health conditions across a range of severity: hypertension, diabetes, and chronic kidney disease.</p><p><strong>Results: </strong>Among more than 2.8 million patients, 99.2% were assigned person-level identifiers (PIKs). There were some differences in the odds of receiving an identifier in adjusted models for patients with hypertension (OR = 1.70, 95% CI: 1.63, 1.77) and diabetes (OR = 1.17, 95% CI: 1.13, 1.22), relative to those without. There were only small differences in the odds of matching to ACS in adjusted models for patients with hypertension (OR = 1.03, 95% CI: 1.03, 1.04), diabetes (OR = 1.02, 95% CI: 1.01, 1.03), and chronic kidney disease (OR = 1.05, 95% CI: 1.03, 1.06), relative to those without.</p><p><strong>Discussion and conclusion: </strong>Our work supports evidence-building across government consistent with the Foundations for Evidence-Based Policymaking Act of 2018 and the goal of leveraging data as a strategic asset. Given the high PIK and ACS match rates, with small differences based on health condition, our findings suggest the feasibility of enhancing the utility of EHR data for research focused on health.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"97-104"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records. 比较六种自然语言处理方法,以评估退伍军人健康管理局电子健康记录中的枪支使用情况。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae169
Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti

Objective: Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.

Materials and methods: We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as "definite access", "definitely no access", or "other".

Results: Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as "other". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.

Discussion and conclusion: Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.

目的:接触枪支与自杀风险增加有关。我们的目的是开发一种自然语言处理方法来描述临床记录中的枪支使用情况:我们使用了 2023 年 4 月 10 日至 2024 年 4 月 10 日期间 36 685 名退伍军人健康管理局(VHA)患者的临床记录。我们利用主题专家扩充了已有的枪支术语集,并围绕笔记中出现的每个枪支术语生成了 250 个字符的片段。注释者将 3000 个片段分为三类。利用这些标注片段,我们比较了四种非神经机器学习模型(随机森林、bagging、梯度提升、带山脊惩罚的逻辑回归)和两个版本的双向编码器表征转换器(Bidirectional Encoder Representations from Transformers,简称 BERT)(特别是 BioBERT 和 Bio-ClinicalBERT),以将枪支接触分为 "肯定接触"、"肯定不接触 "或 "其他":在 36 685 份病历(41.3%)中识别出了枪支术语,33.7% 的片段被归类为明确接触枪支,9.0% 被归类为明确不接触枪支,57.2% 被归类为 "其他"。在对使用枪支进行分类的模型中,六个模型中有五个的性能可以接受,其中 BioBERT 和 Bio-ClinicalBERT 的性能最好,F1 分别为 0.876(95% 置信区间,0.874-0.879)和 0.896(95% 置信区间,0.894-0.899):在退伍军人事务部患者的临床记录中,与枪支有关的术语很常见。利用文本识别和描述患者使用枪支情况的能力可以加强自杀预防工作,我们的六个模型中有五个模型可用于识别患者以进行临床干预。
{"title":"Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.","authors":"Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti","doi":"10.1093/jamia/ocae169","DOIUrl":"10.1093/jamia/ocae169","url":null,"abstract":"<p><strong>Objective: </strong>Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.</p><p><strong>Materials and methods: </strong>We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as \"definite access\", \"definitely no access\", or \"other\".</p><p><strong>Results: </strong>Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as \"other\". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.</p><p><strong>Discussion and conclusion: </strong>Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"113-118"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features. 利用结构化和叙事性电子健康记录特征对阿尔茨海默病患者进行迷你精神状态检查表型分析。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae274
Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng

Objective: This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.

Materials and methods: We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).

Results: We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.

Discussion: This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.

Conclusion: We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.

研究目的本研究旨在使用自然语言处理(NLP)和机器学习(ML)对结构化和非结构化电子病历数据自动预测小型精神状态检查(MMSE)评分,这是阿尔茨海默病患者认知评估中广泛采用的标准:我们从电子病历中提取了人口统计学数据、诊断、药物和非结构化临床就诊记录。我们使用 Latent Dirichlet Allocation (LDA) 进行主题建模,使用 Term-Frequency Inverse Document Frequency (TF-IDF) 进行 n-grams 建模。此外,我们还提取了年龄、民族和种族等元特征。模型的训练和评估采用了极梯度提升(XGBoost)、随机梯度下降回归器(SGDRegressor)和多层感知器(MLP):我们分析了 2019 年 9 月至 2023 年 6 月期间收集的 1654 份临床就诊记录,涉及 1000 名阿尔茨海默病患者。平均 MMSE 得分为 20 分,患者平均年龄为 76.4 岁,54.7% 为女性,54.7% 为白人。表现最好的模型(即均方根误差(RMSE)最小)是 MLP,该模型使用 n-grams,在验证集上的 RMSE 为 5.53,表明其预测性能优于其他模型和特征集。测试集上的 RMSE 为 5.85:本研究开发了一种从非结构化临床笔记中预测 MMSE 分数的 ML 方法,证明了利用 NLP 支持认知评估的可行性。今后的工作重点是完善模型,并评估其在不同环境下的临床相关性:我们利用电子病历特征建立了一个 MMSE 自动估算模型,有可能改变对阿尔茨海默病患者的认知评估,为更明智的临床决策和队列识别铺平道路。
{"title":"Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features.","authors":"Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng","doi":"10.1093/jamia/ocae274","DOIUrl":"10.1093/jamia/ocae274","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.</p><p><strong>Materials and methods: </strong>We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).</p><p><strong>Results: </strong>We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.</p><p><strong>Discussion: </strong>This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.</p><p><strong>Conclusion: </strong>We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"119-128"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1