Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-10-01 DOI:10.2196/56955

Sheng-Feng Sung, Ya-Han Hu, Chong-Yan Chen

{"title":"Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.","authors":"Sheng-Feng Sung, Ya-Han Hu, Chong-Yan Chen","doi":"10.2196/56955","DOIUrl":null,"url":null,"abstract":"Background: Electronic medical records store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in clinical decision support systems is substantial, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for natural language processing in clinical decision support systems. Efficient abbreviation disambiguation methods are needed for effective information extraction.Objective: This study aims to enhance the one-to-all (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple abbreviation meanings. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in Bidirectional Encoder Representations From Transformers (BERT), evaluating the model's efficacy in expanding clinical abbreviations using real data.Methods: Three datasets were used: Medical Subject Headings Word Sense Disambiguation, University of Minnesota, and Chia-Yi Christian Hospital from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pretrained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al's method.Results: BlueBERT achieved macro- and microaccuracies of 95.41% and 95.16%, respectively, on the Medical Subject Headings Word Sense Disambiguation dataset. It improved macroaccuracy by 0.54%-1.53% compared to two baselines, long short-term memory and deepBioWSD with random embedding. On the University of Minnesota dataset, BlueBERT recorded macro- and microaccuracies of 98.40% and 98.22%, respectively. Against the baselines of Word2Vec + support vector machine and BioWordVec + support vector machine, BlueBERT demonstrated a macroaccuracy improvement of 2.61%-4.13%.Conclusions: This research preliminarily validated the effectiveness of the OTA method for abbreviation disambiguation in medical texts, demonstrating the potential to enhance both clinical staff efficiency and research effectiveness.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e56955"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460304/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/56955","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Electronic medical records store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in clinical decision support systems is substantial, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for natural language processing in clinical decision support systems. Efficient abbreviation disambiguation methods are needed for effective information extraction.

Objective: This study aims to enhance the one-to-all (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple abbreviation meanings. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in Bidirectional Encoder Representations From Transformers (BERT), evaluating the model's efficacy in expanding clinical abbreviations using real data.

Methods: Three datasets were used: Medical Subject Headings Word Sense Disambiguation, University of Minnesota, and Chia-Yi Christian Hospital from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pretrained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al's method.

Results: BlueBERT achieved macro- and microaccuracies of 95.41% and 95.16%, respectively, on the Medical Subject Headings Word Sense Disambiguation dataset. It improved macroaccuracy by 0.54%-1.53% compared to two baselines, long short-term memory and deepBioWSD with random embedding. On the University of Minnesota dataset, BlueBERT recorded macro- and microaccuracies of 98.40% and 98.22%, respectively. Against the baselines of Word2Vec + support vector machine and BioWordVec + support vector machine, BlueBERT demonstrated a macroaccuracy improvement of 2.61%-4.13%.

Conclusions: This research preliminarily validated the effectiveness of the OTA method for abbreviation disambiguation in medical texts, demonstrating the potential to enhance both clinical staff efficiency and research effectiveness.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过 "一对一 "分类消除临床缩略语的歧义：算法开发与验证研究。

背景：电子病历存储了大量患者数据，是一个综合性的资料库，其中包括手术和成像报告等文本医疗记录。它们在临床决策支持系统中的作用非常大，但临床文档中广泛使用含糊不清和未标准化的缩写，这给临床决策支持系统中的自然语言处理带来了挑战。为了有效提取信息，需要高效的缩写消歧方法：本研究旨在改进用于临床缩写扩展的一对全（OTA）框架，该框架使用单一模型预测多个缩写的含义。其目的是通过开发上下文候选对和优化双向编码器变换器表征（BERT）中的词嵌入来改进 OTA，并使用真实数据评估该模型在扩展临床缩写方面的功效：方法：使用了三个数据集：方法：使用了三个数据集：医学主题词表词义消歧、明尼苏达大学和 Ditmanson 医学基金会嘉义基督教医院。包含多义缩写的文本经过预处理和格式化后用于 BERT。研究包括微调预训练模型、ClinicalBERT 和 BlueBERT，根据 Huang 等人的方法生成用于训练和测试的数据集对：在医学主题词词义消歧数据集上，BlueBERT 的宏观准确率和微观准确率分别达到 95.41% 和 95.16%。与两个基线（长短期记忆和随机嵌入的 deepBioWSD）相比，它的宏观准确率提高了 0.54%-1.53%。在明尼苏达大学的数据集上，BlueBERT 的宏观准确率和微观准确率分别达到了 98.40% 和 98.22%。与 Word2Vec + 支持向量机和 BioWordVec + 支持向量机的基线相比，BlueBERT 的宏观准确率提高了 2.61%-4.13% ：这项研究初步验证了 OTA 方法在医学文本中进行缩写消歧的有效性，显示了其提高临床工作人员效率和研究效果的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.