AlpaPICO：使用 LLM 从临床试验文档中提取 PICO 框架

arXiv - CS - Information Retrieval Pub Date : 2024-09-15 DOI:arxiv-2409.09704

Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly

{"title":"AlpaPICO：使用 LLM 从临床试验文档中提取 PICO 框架","authors":"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly","doi":"arxiv-2409.09704","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a surge in the publication of clinical trial\nreports, making it challenging to conduct systematic reviews. Automatically\nextracting Population, Intervention, Comparator, and Outcome (PICO) from\nclinical trial studies can alleviate the traditionally time-consuming process\nof manually scrutinizing systematic reviews. Existing approaches of PICO frame\nextraction involves supervised approach that relies on the existence of\nmanually annotated data points in the form of BIO label tagging. Recent\napproaches, such as In-Context Learning (ICL), which has been shown to be\neffective for a number of downstream NLP tasks, require the use of labeled\nexamples. In this work, we adopt ICL strategy by employing the pretrained\nknowledge of Large Language Models (LLMs), gathered during the pretraining\nphase of an LLM, to automatically extract the PICO-related terminologies from\nclinical trial documents in unsupervised set up to bypass the availability of\nlarge number of annotated data instances. Additionally, to showcase the highest\neffectiveness of LLM in oracle scenario where large number of annotated samples\nare available, we adopt the instruction tuning strategy by employing Low Rank\nAdaptation (LORA) to conduct the training of gigantic model in low resource\nenvironment for the PICO frame extraction task. Our empirical results show that\nour proposed ICL-based framework produces comparable results on all the version\nof EBM-NLP datasets and the proposed instruction tuned version of our framework\nproduces state-of-the-art results on all the different EBM-NLP datasets. Our\nproject is available at \\url{https://github.com/shrimonmuke0202/AlpaPICO.git}.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs\",\"authors\":\"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly\",\"doi\":\"arxiv-2409.09704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, there has been a surge in the publication of clinical trial\\nreports, making it challenging to conduct systematic reviews. Automatically\\nextracting Population, Intervention, Comparator, and Outcome (PICO) from\\nclinical trial studies can alleviate the traditionally time-consuming process\\nof manually scrutinizing systematic reviews. Existing approaches of PICO frame\\nextraction involves supervised approach that relies on the existence of\\nmanually annotated data points in the form of BIO label tagging. Recent\\napproaches, such as In-Context Learning (ICL), which has been shown to be\\neffective for a number of downstream NLP tasks, require the use of labeled\\nexamples. In this work, we adopt ICL strategy by employing the pretrained\\nknowledge of Large Language Models (LLMs), gathered during the pretraining\\nphase of an LLM, to automatically extract the PICO-related terminologies from\\nclinical trial documents in unsupervised set up to bypass the availability of\\nlarge number of annotated data instances. Additionally, to showcase the highest\\neffectiveness of LLM in oracle scenario where large number of annotated samples\\nare available, we adopt the instruction tuning strategy by employing Low Rank\\nAdaptation (LORA) to conduct the training of gigantic model in low resource\\nenvironment for the PICO frame extraction task. Our empirical results show that\\nour proposed ICL-based framework produces comparable results on all the version\\nof EBM-NLP datasets and the proposed instruction tuned version of our framework\\nproduces state-of-the-art results on all the different EBM-NLP datasets. Our\\nproject is available at \\\\url{https://github.com/shrimonmuke0202/AlpaPICO.git}.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，临床试验报告的发表量激增，给系统综述带来了挑战。从临床试验研究中自动提取人群、干预措施、比较者和结果（PICO）可以减轻传统上耗时的人工审查系统综述的过程。现有的 PICO 框架提取方法涉及有监督的方法，这种方法依赖于以 BIO 标签标记形式存在的人工注释数据点。最新的方法，如 "上下文学习"（In-Context Learning，简称 ICL），已被证明对许多下游 NLP 任务有效，但需要使用标记过的示例。在这项工作中，我们采用了 ICL 策略，利用在 LLM 预训练阶段收集到的大语言模型（LLM）的预训练知识，在无监督设置下从临床试验文档中自动提取 PICO 相关术语，从而绕过了大量注释数据实例的可用性问题。此外，为了展示 LLM 在有大量注释样本的 oracle 场景中的最高效率，我们采用了低等级适应（LORA）的指令调整策略，在低资源环境中针对 PICO 框架提取任务进行巨型模型训练。我们的实证结果表明，我们提出的基于ICL的框架在所有版本的EBM-NLP数据集上都产生了相似的结果，而我们框架的指令调整版本在所有不同的EBM-NLP数据集上都产生了最先进的结果。我们的项目可在（url{https://github.com/shrimonmuke0202/AlpaPICO.git}.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs

In recent years, there has been a surge in the publication of clinical trial reports, making it challenging to conduct systematic reviews. Automatically extracting Population, Intervention, Comparator, and Outcome (PICO) from clinical trial studies can alleviate the traditionally time-consuming process of manually scrutinizing systematic reviews. Existing approaches of PICO frame extraction involves supervised approach that relies on the existence of manually annotated data points in the form of BIO label tagging. Recent approaches, such as In-Context Learning (ICL), which has been shown to be effective for a number of downstream NLP tasks, require the use of labeled examples. In this work, we adopt ICL strategy by employing the pretrained knowledge of Large Language Models (LLMs), gathered during the pretraining phase of an LLM, to automatically extract the PICO-related terminologies from clinical trial documents in unsupervised set up to bypass the availability of large number of annotated data instances. Additionally, to showcase the highest effectiveness of LLM in oracle scenario where large number of annotated samples are available, we adopt the instruction tuning strategy by employing Low Rank Adaptation (LORA) to conduct the training of gigantic model in low resource environment for the PICO frame extraction task. Our empirical results show that our proposed ICL-based framework produces comparable results on all the version of EBM-NLP datasets and the proposed instruction tuned version of our framework produces state-of-the-art results on all the different EBM-NLP datasets. Our project is available at \url{https://github.com/shrimonmuke0202/AlpaPICO.git}.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量