Assertion Detection in Clinical Natural Language Processing using Large Language Models.

Yuelyu Ji, Zeshui Yu, Yanshan Wang
{"title":"Assertion Detection in Clinical Natural Language Processing using Large Language Models.","authors":"Yuelyu Ji, Zeshui Yu, Yanshan Wang","doi":"10.1109/ichi61247.2024.00039","DOIUrl":null,"url":null,"abstract":"<p><p>In this study, we aim to address the task of assertion detection when extracting medical concepts from clinical notes, a key process in clinical natural language processing (NLP). Assertion detection in clinical NLP usually involves identifying assertion types for medical concepts in the clinical text, namely certainty (whether the medical concept is positive, negated, possible, or hypothetical), temporality (whether the medical concept is for present or the past history), and experiencer (whether the medical concept is described for the patient or a family member). These assertion types are essential for healthcare professionals to quickly and clearly understand the context of medical conditions from unstructured clinical texts, directly influencing the quality and outcomes of patient care. Although widely used, traditional methods, particularly rule-based NLP systems and machine learning or deep learning models, demand intensive manual efforts to create patterns and tend to overlook less common assertion types, leading to an incomplete understanding of the context. To address this challenge, our research introduces a novel methodology that utilizes Large Language Models (LLMs) pre-trained on a vast array of medical data for assertion detection. We enhanced the current method with advanced reasoning techniques, including Tree of Thought (ToT), Chain of Thought (CoT), and Self-Consistency (SC), and refine it further with Low-Rank Adaptation (LoRA) fine-tuning. We first evaluated the model on the i2b2 2010 assertion dataset. Our method achieved a micro-averaged F-1 of 0.89, with 0.11 improvements over the previous works. To further assess the generalizability of our approach, we extended our evaluation to a local dataset that focused on sleep concept extraction. Our approach achieved an F-1 of 0.74, which is 0.31 higher than the previous method. The results show that using LLMs is a viable option for assertion detection in clinical NLP and can potentially integrate with other LLM-based concept extraction models for clinical NLP tasks.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"242-247"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908446/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi61247.2024.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, we aim to address the task of assertion detection when extracting medical concepts from clinical notes, a key process in clinical natural language processing (NLP). Assertion detection in clinical NLP usually involves identifying assertion types for medical concepts in the clinical text, namely certainty (whether the medical concept is positive, negated, possible, or hypothetical), temporality (whether the medical concept is for present or the past history), and experiencer (whether the medical concept is described for the patient or a family member). These assertion types are essential for healthcare professionals to quickly and clearly understand the context of medical conditions from unstructured clinical texts, directly influencing the quality and outcomes of patient care. Although widely used, traditional methods, particularly rule-based NLP systems and machine learning or deep learning models, demand intensive manual efforts to create patterns and tend to overlook less common assertion types, leading to an incomplete understanding of the context. To address this challenge, our research introduces a novel methodology that utilizes Large Language Models (LLMs) pre-trained on a vast array of medical data for assertion detection. We enhanced the current method with advanced reasoning techniques, including Tree of Thought (ToT), Chain of Thought (CoT), and Self-Consistency (SC), and refine it further with Low-Rank Adaptation (LoRA) fine-tuning. We first evaluated the model on the i2b2 2010 assertion dataset. Our method achieved a micro-averaged F-1 of 0.89, with 0.11 improvements over the previous works. To further assess the generalizability of our approach, we extended our evaluation to a local dataset that focused on sleep concept extraction. Our approach achieved an F-1 of 0.74, which is 0.31 higher than the previous method. The results show that using LLMs is a viable option for assertion detection in clinical NLP and can potentially integrate with other LLM-based concept extraction models for clinical NLP tasks.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于大型语言模型的临床自然语言处理断言检测。
在本研究中,我们旨在解决临床自然语言处理(NLP)中的关键过程——从临床笔记中提取医学概念时的断言检测问题。临床NLP中的断言检测通常涉及识别临床文本中医学概念的断言类型,即确定性(医学概念是肯定的、否定的、可能的还是假设的)、时间性(医学概念是针对现在还是过去的历史)和经验者(医学概念是针对患者还是家庭成员描述的)。这些断言类型对于医疗保健专业人员从非结构化临床文本中快速清晰地理解医疗条件的背景至关重要,直接影响患者护理的质量和结果。尽管被广泛使用,传统方法,特别是基于规则的NLP系统和机器学习或深度学习模型,需要大量的手工工作来创建模式,并且往往忽略了不太常见的断言类型,导致对上下文的不完整理解。为了应对这一挑战,我们的研究引入了一种新的方法,该方法利用在大量医疗数据上预训练的大型语言模型(llm)进行断言检测。我们采用先进的推理技术,包括思想树(ToT)、思想链(CoT)和自一致性(SC),对现有方法进行了改进,并进一步采用低秩自适应(LoRA)微调对其进行了改进。我们首先在i2b2 2010断言数据集上评估该模型。我们的方法实现了0.89的微平均F-1,比以前的工作提高了0.11。为了进一步评估我们方法的普遍性,我们将我们的评估扩展到一个专注于睡眠概念提取的本地数据集。我们的方法获得了0.74的F-1,比以前的方法高0.31。结果表明,使用llm是临床NLP中断言检测的可行选择,并且可以与其他基于llm的概念提取模型集成,用于临床NLP任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Do LLMs Surpass Encoders for Biomedical NER? What to consider when developing a new molecular HIV surveillance tool: Perspectives of key stakeholders working in HIV prevention and treatment. Lab-AI: Using Retrieval Augmentation to Enhance Language Models for Personalized Lab Test Interpretation in Clinical Medicine. An average-case efficient two-stage algorithm for enumerating all longest common substrings of minimum length k between genome pairs. Analyzing Social Factors to Enhance Suicide Prevention Across Population Groups.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1