Exploration of Using an Open-Source Large Language Model for Analyzing Trial Information: A Case Study of Clinical Trials With Decentralized Elements

IF 3.1 3区 医学 Q2 MEDICINE, RESEARCH & EXPERIMENTAL Cts-Clinical and Translational Science Pub Date : 2025-03-02 DOI:10.1111/cts.70183
Ki Young Huh, Ildae Song, Yoonjin Kim, Jiyeon Park, Hyunwook Ryu, JaeEun Koh, Kyung-Sang Yu, Kyung Hwan Kim, SeungHwan Lee
{"title":"Exploration of Using an Open-Source Large Language Model for Analyzing Trial Information: A Case Study of Clinical Trials With Decentralized Elements","authors":"Ki Young Huh,&nbsp;Ildae Song,&nbsp;Yoonjin Kim,&nbsp;Jiyeon Park,&nbsp;Hyunwook Ryu,&nbsp;JaeEun Koh,&nbsp;Kyung-Sang Yu,&nbsp;Kyung Hwan Kim,&nbsp;SeungHwan Lee","doi":"10.1111/cts.70183","DOIUrl":null,"url":null,"abstract":"<p>Despite interest in clinical trials with decentralized elements (DCTs), analysis of their trends in trial registries is lacking due to heterogeneous designs and unstandardized terms. We explored Llama 3, an open-source large language model, to efficiently evaluate these trends. Trial data were sourced from Aggregate Analysis of ClinicalTrials.gov, focusing on drug trials conducted between 2018 and 2023. We utilized three Llama 3 models with a different number of parameters: 8b (model 1), fine-tuned 8b (model 2) with curated data, and 70b (model 3). Prompt engineering enabled sophisticated tasks such as classification of DCTs with explanations and extracting decentralized elements. Model performance, evaluated on a 3-month exploratory test dataset, demonstrated that sensitivity could be improved after fine-tuning from 0.0357 to 0.5385. Low positive predictive value in the fine-tuned model 2 could be improved by focusing on trials with DCT-associated expressions from 0.5385 to 0.9167. However, the extraction of decentralized elements was only properly performed by model 3, which had a larger number of parameters. Based on the results, we screened the entire 6-year dataset after applying DCT-associated expressions. After the subsequent application of models 2 and 3, we identified 692 DCTs. We found that a total of 213 trials were classified as phase 2, followed by 162 phase 4 trials, 112 phase 3 trials, and 92 phase 1 trials. In conclusion, our study demonstrated the potential of large language models for analyzing clinical trial information not structured in a machine-readable format. Managing potential biases during model application is crucial.</p>","PeriodicalId":50610,"journal":{"name":"Cts-Clinical and Translational Science","volume":"18 3","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cts.70183","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cts-Clinical and Translational Science","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cts.70183","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Despite interest in clinical trials with decentralized elements (DCTs), analysis of their trends in trial registries is lacking due to heterogeneous designs and unstandardized terms. We explored Llama 3, an open-source large language model, to efficiently evaluate these trends. Trial data were sourced from Aggregate Analysis of ClinicalTrials.gov, focusing on drug trials conducted between 2018 and 2023. We utilized three Llama 3 models with a different number of parameters: 8b (model 1), fine-tuned 8b (model 2) with curated data, and 70b (model 3). Prompt engineering enabled sophisticated tasks such as classification of DCTs with explanations and extracting decentralized elements. Model performance, evaluated on a 3-month exploratory test dataset, demonstrated that sensitivity could be improved after fine-tuning from 0.0357 to 0.5385. Low positive predictive value in the fine-tuned model 2 could be improved by focusing on trials with DCT-associated expressions from 0.5385 to 0.9167. However, the extraction of decentralized elements was only properly performed by model 3, which had a larger number of parameters. Based on the results, we screened the entire 6-year dataset after applying DCT-associated expressions. After the subsequent application of models 2 and 3, we identified 692 DCTs. We found that a total of 213 trials were classified as phase 2, followed by 162 phase 4 trials, 112 phase 3 trials, and 92 phase 1 trials. In conclusion, our study demonstrated the potential of large language models for analyzing clinical trial information not structured in a machine-readable format. Managing potential biases during model application is crucial.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Cts-Clinical and Translational Science
Cts-Clinical and Translational Science 医学-医学:研究与实验
CiteScore
6.70
自引率
2.60%
发文量
234
审稿时长
6-12 weeks
期刊介绍: Clinical and Translational Science (CTS), an official journal of the American Society for Clinical Pharmacology and Therapeutics, highlights original translational medicine research that helps bridge laboratory discoveries with the diagnosis and treatment of human disease. Translational medicine is a multi-faceted discipline with a focus on translational therapeutics. In a broad sense, translational medicine bridges across the discovery, development, regulation, and utilization spectrum. Research may appear as Full Articles, Brief Reports, Commentaries, Phase Forwards (clinical trials), Reviews, or Tutorials. CTS also includes invited didactic content that covers the connections between clinical pharmacology and translational medicine. Best-in-class methodologies and best practices are also welcomed as Tutorials. These additional features provide context for research articles and facilitate understanding for a wide array of individuals interested in clinical and translational science. CTS welcomes high quality, scientifically sound, original manuscripts focused on clinical pharmacology and translational science, including animal, in vitro, in silico, and clinical studies supporting the breadth of drug discovery, development, regulation and clinical use of both traditional drugs and innovative modalities.
期刊最新文献
Agents for Change: Artificial Intelligent Workflows for Quantitative Clinical Pharmacology and Translational Sciences Pharmacokinetics and Pharmacodynamics of KT-474, a Novel Selective Interleukin-1 Receptor–Associated Kinase 4 (IRAK4) Degrader, in Healthy Adults The Impact of Heart Rate Reduction From Individual Baseline With Propranolol for Primary and Secondary Prophylaxis of Variceal Hemorrhage in Cirrhosis Evaluation of Innate Immune System, Body Habitus, and Sex on the Pharmacokinetics and Pharmacodynamics of Anetumab Ravtansine in Patients With Cancer Transarterial Chemoembolization for Patients With Hepatocellular Carcinoma Using Miriplatin Without the Need for Hydration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1