Leveraging Large Language Models for Clinical Abbreviation Disambiguation.

IF 5.7 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Medical Systems Pub Date : 2024-02-27 DOI:10.1007/s10916-024-02049-z

Manda Hosseini, Mandana Hosseini, Reza Javidan

{"title":"Leveraging Large Language Models for Clinical Abbreviation Disambiguation.","authors":"Manda Hosseini, Mandana Hosseini, Reza Javidan","doi":"10.1007/s10916-024-02049-z","DOIUrl":null,"url":null,"abstract":"<p><p>Clinical abbreviation disambiguation is a crucial task in the biomedical domain, as the accurate identification of the intended meanings or expansions of abbreviations in clinical texts is vital for medical information retrieval and analysis. Existing approaches have shown promising results, but challenges such as limited instances and ambiguous interpretations persist. In this paper, we propose an approach to address these challenges and enhance the performance of clinical abbreviation disambiguation. Our objective is to leverage the power of Large Language Models (LLMs) and employ a Generative Model (GM) to augment the dataset with contextually relevant instances, enabling more accurate disambiguation across diverse clinical contexts. We integrate the contextual understanding of LLMs, represented by BlueBERT and Transformers, with data augmentation using a Generative Model, called Biomedical Generative Pre-trained Transformer (BIOGPT), that is pretrained on an extensive corpus of biomedical literature to capture the intricacies of medical terminology and context. By providing the BIOGPT with relevant medical terms and sense information, we generate diverse instances of clinical text that accurately represent the intended meanings of abbreviations. We evaluate our approach on the widely recognized CASI dataset, carefully partitioned into training, validation, and test sets. The incorporation of data augmentation with the GM improves the model's performance, particularly for senses with limited instances, effectively addressing dataset imbalance and challenges posed by similar concepts. The results demonstrate the efficacy of our proposed method, showcasing the significance of LLMs and generative techniques in clinical abbreviation disambiguation. Our model achieves a good accuracy on the test set, outperforming previous methods.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"48 1","pages":"27"},"PeriodicalIF":5.7000,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10916-024-02049-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Clinical abbreviation disambiguation is a crucial task in the biomedical domain, as the accurate identification of the intended meanings or expansions of abbreviations in clinical texts is vital for medical information retrieval and analysis. Existing approaches have shown promising results, but challenges such as limited instances and ambiguous interpretations persist. In this paper, we propose an approach to address these challenges and enhance the performance of clinical abbreviation disambiguation. Our objective is to leverage the power of Large Language Models (LLMs) and employ a Generative Model (GM) to augment the dataset with contextually relevant instances, enabling more accurate disambiguation across diverse clinical contexts. We integrate the contextual understanding of LLMs, represented by BlueBERT and Transformers, with data augmentation using a Generative Model, called Biomedical Generative Pre-trained Transformer (BIOGPT), that is pretrained on an extensive corpus of biomedical literature to capture the intricacies of medical terminology and context. By providing the BIOGPT with relevant medical terms and sense information, we generate diverse instances of clinical text that accurately represent the intended meanings of abbreviations. We evaluate our approach on the widely recognized CASI dataset, carefully partitioned into training, validation, and test sets. The incorporation of data augmentation with the GM improves the model's performance, particularly for senses with limited instances, effectively addressing dataset imbalance and challenges posed by similar concepts. The results demonstrate the efficacy of our proposed method, showcasing the significance of LLMs and generative techniques in clinical abbreviation disambiguation. Our model achieves a good accuracy on the test set, outperforming previous methods.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用大型语言模型进行临床缩写消歧。

临床缩写消歧是生物医学领域的一项重要任务，因为准确识别临床文本中缩写的预期含义或扩展对于医学信息检索和分析至关重要。现有的方法已经取得了可喜的成果，但仍存在实例有限和解释模糊等难题。在本文中，我们提出了一种方法来应对这些挑战，并提高临床缩写消歧的性能。我们的目标是利用大型语言模型（LLM）的强大功能，并采用生成模型（GM）通过上下文相关的实例来增强数据集，从而在不同的临床语境中实现更准确的消歧。我们将以 BlueBERT 和 Transformers 为代表的 LLM 的上下文理解与使用生成模型（称为生物医学生成预训练转换器 (BIOGPT)）进行的数据增强相结合，该生成模型在大量生物医学文献语料库中进行了预训练，以捕捉错综复杂的医学术语和上下文。通过向 BIOGPT 提供相关的医学术语和意义信息，我们生成了临床文本的各种实例，这些实例准确地表达了缩写的预期含义。我们在广受认可的 CASI 数据集上评估了我们的方法，该数据集被仔细划分为训练集、验证集和测试集。将数据增强与 GM 结合在一起提高了模型的性能，尤其是对于实例有限的感官，有效解决了数据集的不平衡和相似概念带来的挑战。结果证明了我们提出的方法的有效性，展示了 LLM 和生成技术在临床缩写消歧中的重要性。我们的模型在测试集上达到了很高的准确率，优于之前的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Medical Systems 医学-卫生保健

CiteScore

11.60

自引率

1.90%

发文量

审稿时长

4.8 months

期刊介绍： Journal of Medical Systems provides a forum for the presentation and discussion of the increasingly extensive applications of new systems techniques and methods in hospital clinic and physician''s office administration; pathology radiology and pharmaceutical delivery systems; medical records storage and retrieval; and ancillary patient-support systems. The journal publishes informative articles essays and studies across the entire scale of medical systems from large hospital programs to novel small-scale medical services. Education is an integral part of this amalgamation of sciences and selected articles are published in this area. Since existing medical systems are constantly being modified to fit particular circumstances and to solve specific problems the journal includes a special section devoted to status reports on current installations.