#2924 Comparison of large language models and traditional natural language processing techniques in predicting arteriovenous fistula failure

IF 5.4 3区 材料科学 Q2 CHEMISTRY, PHYSICAL ACS Applied Energy Materials Pub Date : 2024-05-01 DOI:10.1093/ndt/gfae069.792
Suman K. Lama, Hanjie Zhang, C. Monaghan, F. Bellocchio, S. Chaudhuri, Luca Neri, Len Usvyat
{"title":"#2924 Comparison of large language models and traditional natural language processing techniques in predicting arteriovenous fistula failure","authors":"Suman K. Lama, Hanjie Zhang, C. Monaghan, F. Bellocchio, S. Chaudhuri, Luca Neri, Len Usvyat","doi":"10.1093/ndt/gfae069.792","DOIUrl":null,"url":null,"abstract":"\n \n \n Large language models (LLMs) have gained significant attention in the field of natural language processing (NLP), marking a shift from traditional techniques like Term Frequency-Inverse Document Frequency (TF-IDF). We developed a traditional NLP model to predict arteriovenous fistula (AVF) failure within next 30 days using clinical notes. The goal of this analysis was to investigate whether LLMs would outperform traditional NLP techniques, specifically in the context of predicting AVF failure within the next 30 days using clinical notes.\n \n \n \n We defined AVF failure as the change in status from active to permanently unusable status or temporarily unusable status. We used data from a large kidney care network from January 2021 to December 2021. Two models were created using LLMs and traditional TF-IDF technique. We used “distilbert-base-uncased”, a distilled version of BERT base model [1], and compared its performance with traditional TF-IDF-based NLP techniques. The dataset was randomly divided into 60% training, 20% validation and 20% test dataset. The test data, comprising of unseen patients’ data was used to evaluate the performance of the model. Both models were evaluated using metrics such as area under the receiver operating curve (AUROC), accuracy, sensitivity, and specificity.\n \n \n \n The incidence of 30 days AVF failure rate was 2.3% in the population. Both LLMs and traditional showed similar overall performance as summarized in Table 1. Notably, LLMs showed marginally better performance in certain evaluation metrics. Both models had same AUROC of 0.64 on test data. The accuracy and balanced accuracy for LLMs were 72.9% and 59.7%, respectively, compared to 70.9% and 59.6% for the traditional TF-IDF approach. In terms of specificity, LLMs scored 73.2%, slightly higher than the 71.2% observed for traditional NLP methods. However, LLMs had a lower sensitivity of 46.1% compared to 48% for traditional NLP. However, it is worth noting that training on LLMs took considerably longer than TF-IDF. Moreover, it also used higher computational resources such as utilization of graphics processing units (GPU) instances in cloud-based services, leading to higher cost.\n \n \n \n In our study, we discovered that advanced LLMs perform comparably to traditional TF-IDF modeling techniques in predicting the failure of AVF. Both models demonstrated identical AUROC. While specificity was higher in LLMs compared to traditional NLP, sensitivity was higher in traditional NLP compared to LLMs. LLM was fine-tuned with a limited dataset, which could have influenced its performance to be similar to that of traditional NLP methods. This finding suggests that while LLMs may excel in certain scenarios, such as performing in-depth sentiment analysis of patient data for complex tasks, their effectiveness is highly dependent on the specific use case. It is crucial to weigh the benefits against the resources required for LLMs, as they can be significantly more resource-intensive and costly compared to traditional TF-IDF methods. This highlights the importance of a use-case-driven approach in selecting the appropriate NLP technique for healthcare applications.\n","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":"87 4","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ndt/gfae069.792","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) have gained significant attention in the field of natural language processing (NLP), marking a shift from traditional techniques like Term Frequency-Inverse Document Frequency (TF-IDF). We developed a traditional NLP model to predict arteriovenous fistula (AVF) failure within next 30 days using clinical notes. The goal of this analysis was to investigate whether LLMs would outperform traditional NLP techniques, specifically in the context of predicting AVF failure within the next 30 days using clinical notes. We defined AVF failure as the change in status from active to permanently unusable status or temporarily unusable status. We used data from a large kidney care network from January 2021 to December 2021. Two models were created using LLMs and traditional TF-IDF technique. We used “distilbert-base-uncased”, a distilled version of BERT base model [1], and compared its performance with traditional TF-IDF-based NLP techniques. The dataset was randomly divided into 60% training, 20% validation and 20% test dataset. The test data, comprising of unseen patients’ data was used to evaluate the performance of the model. Both models were evaluated using metrics such as area under the receiver operating curve (AUROC), accuracy, sensitivity, and specificity. The incidence of 30 days AVF failure rate was 2.3% in the population. Both LLMs and traditional showed similar overall performance as summarized in Table 1. Notably, LLMs showed marginally better performance in certain evaluation metrics. Both models had same AUROC of 0.64 on test data. The accuracy and balanced accuracy for LLMs were 72.9% and 59.7%, respectively, compared to 70.9% and 59.6% for the traditional TF-IDF approach. In terms of specificity, LLMs scored 73.2%, slightly higher than the 71.2% observed for traditional NLP methods. However, LLMs had a lower sensitivity of 46.1% compared to 48% for traditional NLP. However, it is worth noting that training on LLMs took considerably longer than TF-IDF. Moreover, it also used higher computational resources such as utilization of graphics processing units (GPU) instances in cloud-based services, leading to higher cost. In our study, we discovered that advanced LLMs perform comparably to traditional TF-IDF modeling techniques in predicting the failure of AVF. Both models demonstrated identical AUROC. While specificity was higher in LLMs compared to traditional NLP, sensitivity was higher in traditional NLP compared to LLMs. LLM was fine-tuned with a limited dataset, which could have influenced its performance to be similar to that of traditional NLP methods. This finding suggests that while LLMs may excel in certain scenarios, such as performing in-depth sentiment analysis of patient data for complex tasks, their effectiveness is highly dependent on the specific use case. It is crucial to weigh the benefits against the resources required for LLMs, as they can be significantly more resource-intensive and costly compared to traditional TF-IDF methods. This highlights the importance of a use-case-driven approach in selecting the appropriate NLP technique for healthcare applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
#2924 大型语言模型与传统自然语言处理技术在预测动静脉瘘失败方面的比较
大语言模型(LLM)在自然语言处理(NLP)领域获得了极大关注,标志着术语频率-反向文档频率(TF-IDF)等传统技术的转变。我们开发了一个传统的 NLP 模型,利用临床笔记预测未来 30 天内动静脉瘘 (AVF) 的失败情况。本分析的目的是研究 LLM 是否优于传统的 NLP 技术,特别是在利用临床笔记预测未来 30 天内动静脉瘘失败的情况下。 我们将 AVF 失效定义为状态从活动状态变为永久不可用状态或暂时不可用状态。我们使用了一个大型肾脏护理网络从 2021 年 1 月到 2021 年 12 月的数据。我们使用 LLMs 和传统的 TF-IDF 技术创建了两个模型。我们使用了 "distilbert-base-uncased"(BERT 基础模型的提炼版本[1]),并将其性能与传统的基于 TF-IDF 的 NLP 技术进行了比较。数据集随机分为 60% 的训练数据集、20% 的验证数据集和 20% 的测试数据集。测试数据由未见过的患者数据组成,用于评估模型的性能。两个模型都使用接收者操作曲线下面积(AUROC)、准确性、灵敏度和特异性等指标进行评估。 人群中 30 天 AVF 失败率为 2.3%。如表 1 所示,LLMs 和传统方法的总体表现相似。值得注意的是,LLMs 在某些评价指标上略胜一筹。两个模型在测试数据上的 AUROC 都是 0.64。LLMs 的准确率和均衡准确率分别为 72.9% 和 59.7%,而传统 TF-IDF 方法的准确率和均衡准确率分别为 70.9% 和 59.6%。在特异性方面,LLMs 的得分为 73.2%,略高于传统 NLP 方法的 71.2%。不过,LLMs 的灵敏度较低,为 46.1%,而传统 NLP 的灵敏度为 48%。不过,值得注意的是,LLMs 的训练时间远远长于 TF-IDF。此外,它还使用了更多的计算资源,如利用基于云服务的图形处理器(GPU)实例,从而导致成本增加。 在我们的研究中,我们发现先进的 LLM 在预测 AVF 故障方面的表现与传统的 TF-IDF 建模技术相当。两种模型的 AUROC 相同。与传统 NLP 相比,LLM 的特异性更高,而与 LLM 相比,传统 NLP 的灵敏度更高。LLM 是通过有限的数据集进行微调的,这可能会影响其性能,使其与传统的 NLP 方法相近。这一发现表明,虽然 LLM 在某些情况下可能表现出色,例如在复杂任务中对患者数据进行深入的情感分析,但其有效性在很大程度上取决于具体的使用情况。关键是要权衡 LLMs 的优势和所需资源,因为与传统的 TF-IDF 方法相比,LLMs 的资源密集度和成本都要高得多。这凸显了在为医疗保健应用选择合适的 NLP 技术时,使用案例驱动方法的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACS Applied Energy Materials
ACS Applied Energy Materials Materials Science-Materials Chemistry
CiteScore
10.30
自引率
6.20%
发文量
1368
期刊介绍: ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.
期刊最新文献
Issue Editorial Masthead Issue Publication Information Unraveling Vibrational Energies of Chemical Bonds in Silver-Containing Chalcopyrite Compounds (Ag,Cu)InSe2 and Ag(In,Ga)Se2 by Low-Temperature EXAFS Analysis Investigation of the Internal Pressure Exerted by a LaNi5 Bed on a Vertical Cylindrical Vessel and Its Packing Fraction Distribution during Cyclic Hydrogen Ab/Desorption Ion Transport in Polymerized Ionic Liquids: Effects of Side Chain Flexibility and Specific Interactions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1