Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study.

IF 6 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Medical Internet Research Pub Date : 2025-01-30 DOI:10.2196/54601
William Trevena, Xiang Zhong, Michelle Alvarado, Alexander Semenov, Alp Oktay, Devin Devlin, Aarya Yogesh Gohil, Sai Harsha Chittimouju
{"title":"Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study.","authors":"William Trevena, Xiang Zhong, Michelle Alvarado, Alexander Semenov, Alp Oktay, Devin Devlin, Aarya Yogesh Gohil, Sai Harsha Chittimouju","doi":"10.2196/54601","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes.</p><p><strong>Objective: </strong>The aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain.</p><p><strong>Methods: </strong>We used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes.</p><p><strong>Results: </strong>Among the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F<sub>1</sub>-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models' robustness in an imbalanced data context.</p><p><strong>Conclusions: </strong>This study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e54601"},"PeriodicalIF":6.0000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11826943/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/54601","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes.

Objective: The aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain.

Methods: We used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes.

Results: Among the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F1-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models' robustness in an imbalanced data context.

Conclusions: This study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在网络论坛中使用大型语言模型来检测和理解药物停药事件:开发和验证研究。
背景:大型语言模型(llm)的实现,如BART(双向和自回归变形器)和GPT-4,已经彻底改变了从非结构化文本中提取见解的方法。这些进步已经扩展到医疗保健领域,允许对社交媒体进行分析,以获得公共卫生见解。然而,药物停药事件(DDEs)的检测仍未得到充分探索。识别DDEs对于了解药物依从性和患者预后至关重要。目的:本研究的目的是为在数据稀疏的环境中调查各种临床研究问题提供一个灵活的框架。我们通过在基于web的开源论坛MedHelp中识别DDE及其根源,并通过发布第一个开源DDE数据集来帮助该领域的进一步研究,提供了该框架实用程序的示例。方法:我们使用了几个llm,包括GPT-4 Turbo, gpt - 40, DeBERTa(解码增强的双向编码器表示从变压器与解纠缠的注意力),BART等,以检测和确定MedHelp上发布的用户评论中的DDEs的根本原因。我们的研究设计包括使用零射击分类,这允许这些模型在没有特定任务训练的情况下做出预测。我们将用户评论分成句子,并应用不同的分类策略来评估这些模型在识别DDEs及其根本原因方面的性能。结果:在所选模型中,gpt - 40在确定DDEs的根本原因方面表现最好,仅预测不正确的根本原因(汉明损失)为12.9%。在测试的开源模型中,BART在检测DDEs方面表现最好,f1得分为0.86,假阳性率为2.8%,假阴性率为6.5%,均未进行任何微调。数据集包含10.7%(107/1000)的DDEs,强调了模型在不平衡数据背景下的稳健性。结论:本研究证明了开源和闭源llm(如gpt - 40和BART)通过零射击分类从公开可访问的数据中检测DDEs及其根本原因的有效性。我们提出的健壮且可扩展的框架可以帮助研究人员解决数据稀疏的临床研究问题。开放获取DDE数据集的推出有可能刺激该领域的进一步研究和新发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
14.40
自引率
5.40%
发文量
654
审稿时长
1 months
期刊介绍: The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.
期刊最新文献
Advancing Nursing Data Integration Through a Nursing Minimum Dataset for the Conceptual and Technical Development of a "Fall Prevention" Data Module: Development Study. When Old Diseases Return: Cholera, Crisis, and Digital Surveillance in Fragile Settings. Effectiveness of Postdischarge Telephone Calls in Reducing Hospital Utilization: Quasi-Randomized Controlled Trial. Effects of Self-Compassion and Mindfulness Interventions on Mental Health and Work-Related Outcomes Among Japanese Workers: Randomized Controlled Trial. Disclaimers and Referral Patterns for Medical Advice Across Urgency Levels: Large Language Model Evaluation Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1