AlpaPICO:使用 LLM 从临床试验文档中提取 PICO 框架

Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly
{"title":"AlpaPICO:使用 LLM 从临床试验文档中提取 PICO 框架","authors":"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly","doi":"arxiv-2409.09704","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a surge in the publication of clinical trial\nreports, making it challenging to conduct systematic reviews. Automatically\nextracting Population, Intervention, Comparator, and Outcome (PICO) from\nclinical trial studies can alleviate the traditionally time-consuming process\nof manually scrutinizing systematic reviews. Existing approaches of PICO frame\nextraction involves supervised approach that relies on the existence of\nmanually annotated data points in the form of BIO label tagging. Recent\napproaches, such as In-Context Learning (ICL), which has been shown to be\neffective for a number of downstream NLP tasks, require the use of labeled\nexamples. In this work, we adopt ICL strategy by employing the pretrained\nknowledge of Large Language Models (LLMs), gathered during the pretraining\nphase of an LLM, to automatically extract the PICO-related terminologies from\nclinical trial documents in unsupervised set up to bypass the availability of\nlarge number of annotated data instances. Additionally, to showcase the highest\neffectiveness of LLM in oracle scenario where large number of annotated samples\nare available, we adopt the instruction tuning strategy by employing Low Rank\nAdaptation (LORA) to conduct the training of gigantic model in low resource\nenvironment for the PICO frame extraction task. Our empirical results show that\nour proposed ICL-based framework produces comparable results on all the version\nof EBM-NLP datasets and the proposed instruction tuned version of our framework\nproduces state-of-the-art results on all the different EBM-NLP datasets. Our\nproject is available at \\url{https://github.com/shrimonmuke0202/AlpaPICO.git}.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs\",\"authors\":\"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly\",\"doi\":\"arxiv-2409.09704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, there has been a surge in the publication of clinical trial\\nreports, making it challenging to conduct systematic reviews. Automatically\\nextracting Population, Intervention, Comparator, and Outcome (PICO) from\\nclinical trial studies can alleviate the traditionally time-consuming process\\nof manually scrutinizing systematic reviews. Existing approaches of PICO frame\\nextraction involves supervised approach that relies on the existence of\\nmanually annotated data points in the form of BIO label tagging. Recent\\napproaches, such as In-Context Learning (ICL), which has been shown to be\\neffective for a number of downstream NLP tasks, require the use of labeled\\nexamples. In this work, we adopt ICL strategy by employing the pretrained\\nknowledge of Large Language Models (LLMs), gathered during the pretraining\\nphase of an LLM, to automatically extract the PICO-related terminologies from\\nclinical trial documents in unsupervised set up to bypass the availability of\\nlarge number of annotated data instances. Additionally, to showcase the highest\\neffectiveness of LLM in oracle scenario where large number of annotated samples\\nare available, we adopt the instruction tuning strategy by employing Low Rank\\nAdaptation (LORA) to conduct the training of gigantic model in low resource\\nenvironment for the PICO frame extraction task. Our empirical results show that\\nour proposed ICL-based framework produces comparable results on all the version\\nof EBM-NLP datasets and the proposed instruction tuned version of our framework\\nproduces state-of-the-art results on all the different EBM-NLP datasets. Our\\nproject is available at \\\\url{https://github.com/shrimonmuke0202/AlpaPICO.git}.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,临床试验报告的发表量激增,给系统综述带来了挑战。从临床试验研究中自动提取人群、干预措施、比较者和结果(PICO)可以减轻传统上耗时的人工审查系统综述的过程。现有的 PICO 框架提取方法涉及有监督的方法,这种方法依赖于以 BIO 标签标记形式存在的人工注释数据点。最新的方法,如 "上下文学习"(In-Context Learning,简称 ICL),已被证明对许多下游 NLP 任务有效,但需要使用标记过的示例。在这项工作中,我们采用了 ICL 策略,利用在 LLM 预训练阶段收集到的大语言模型(LLM)的预训练知识,在无监督设置下从临床试验文档中自动提取 PICO 相关术语,从而绕过了大量注释数据实例的可用性问题。此外,为了展示 LLM 在有大量注释样本的 oracle 场景中的最高效率,我们采用了低等级适应(LORA)的指令调整策略,在低资源环境中针对 PICO 框架提取任务进行巨型模型训练。我们的实证结果表明,我们提出的基于ICL的框架在所有版本的EBM-NLP数据集上都产生了相似的结果,而我们框架的指令调整版本在所有不同的EBM-NLP数据集上都产生了最先进的结果。我们的项目可在(url{https://github.com/shrimonmuke0202/AlpaPICO.git}.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs
In recent years, there has been a surge in the publication of clinical trial reports, making it challenging to conduct systematic reviews. Automatically extracting Population, Intervention, Comparator, and Outcome (PICO) from clinical trial studies can alleviate the traditionally time-consuming process of manually scrutinizing systematic reviews. Existing approaches of PICO frame extraction involves supervised approach that relies on the existence of manually annotated data points in the form of BIO label tagging. Recent approaches, such as In-Context Learning (ICL), which has been shown to be effective for a number of downstream NLP tasks, require the use of labeled examples. In this work, we adopt ICL strategy by employing the pretrained knowledge of Large Language Models (LLMs), gathered during the pretraining phase of an LLM, to automatically extract the PICO-related terminologies from clinical trial documents in unsupervised set up to bypass the availability of large number of annotated data instances. Additionally, to showcase the highest effectiveness of LLM in oracle scenario where large number of annotated samples are available, we adopt the instruction tuning strategy by employing Low Rank Adaptation (LORA) to conduct the training of gigantic model in low resource environment for the PICO frame extraction task. Our empirical results show that our proposed ICL-based framework produces comparable results on all the version of EBM-NLP datasets and the proposed instruction tuned version of our framework produces state-of-the-art results on all the different EBM-NLP datasets. Our project is available at \url{https://github.com/shrimonmuke0202/AlpaPICO.git}.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation Active Reconfigurable Intelligent Surface Empowered Synthetic Aperture Radar Imaging FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1