{"title":"AlpaPICO:使用 LLM 从临床试验文档中提取 PICO 框架","authors":"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly","doi":"arxiv-2409.09704","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a surge in the publication of clinical trial\nreports, making it challenging to conduct systematic reviews. Automatically\nextracting Population, Intervention, Comparator, and Outcome (PICO) from\nclinical trial studies can alleviate the traditionally time-consuming process\nof manually scrutinizing systematic reviews. Existing approaches of PICO frame\nextraction involves supervised approach that relies on the existence of\nmanually annotated data points in the form of BIO label tagging. Recent\napproaches, such as In-Context Learning (ICL), which has been shown to be\neffective for a number of downstream NLP tasks, require the use of labeled\nexamples. In this work, we adopt ICL strategy by employing the pretrained\nknowledge of Large Language Models (LLMs), gathered during the pretraining\nphase of an LLM, to automatically extract the PICO-related terminologies from\nclinical trial documents in unsupervised set up to bypass the availability of\nlarge number of annotated data instances. Additionally, to showcase the highest\neffectiveness of LLM in oracle scenario where large number of annotated samples\nare available, we adopt the instruction tuning strategy by employing Low Rank\nAdaptation (LORA) to conduct the training of gigantic model in low resource\nenvironment for the PICO frame extraction task. Our empirical results show that\nour proposed ICL-based framework produces comparable results on all the version\nof EBM-NLP datasets and the proposed instruction tuned version of our framework\nproduces state-of-the-art results on all the different EBM-NLP datasets. Our\nproject is available at \\url{https://github.com/shrimonmuke0202/AlpaPICO.git}.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs\",\"authors\":\"Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly\",\"doi\":\"arxiv-2409.09704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, there has been a surge in the publication of clinical trial\\nreports, making it challenging to conduct systematic reviews. Automatically\\nextracting Population, Intervention, Comparator, and Outcome (PICO) from\\nclinical trial studies can alleviate the traditionally time-consuming process\\nof manually scrutinizing systematic reviews. Existing approaches of PICO frame\\nextraction involves supervised approach that relies on the existence of\\nmanually annotated data points in the form of BIO label tagging. Recent\\napproaches, such as In-Context Learning (ICL), which has been shown to be\\neffective for a number of downstream NLP tasks, require the use of labeled\\nexamples. In this work, we adopt ICL strategy by employing the pretrained\\nknowledge of Large Language Models (LLMs), gathered during the pretraining\\nphase of an LLM, to automatically extract the PICO-related terminologies from\\nclinical trial documents in unsupervised set up to bypass the availability of\\nlarge number of annotated data instances. Additionally, to showcase the highest\\neffectiveness of LLM in oracle scenario where large number of annotated samples\\nare available, we adopt the instruction tuning strategy by employing Low Rank\\nAdaptation (LORA) to conduct the training of gigantic model in low resource\\nenvironment for the PICO frame extraction task. Our empirical results show that\\nour proposed ICL-based framework produces comparable results on all the version\\nof EBM-NLP datasets and the proposed instruction tuned version of our framework\\nproduces state-of-the-art results on all the different EBM-NLP datasets. Our\\nproject is available at \\\\url{https://github.com/shrimonmuke0202/AlpaPICO.git}.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs
In recent years, there has been a surge in the publication of clinical trial
reports, making it challenging to conduct systematic reviews. Automatically
extracting Population, Intervention, Comparator, and Outcome (PICO) from
clinical trial studies can alleviate the traditionally time-consuming process
of manually scrutinizing systematic reviews. Existing approaches of PICO frame
extraction involves supervised approach that relies on the existence of
manually annotated data points in the form of BIO label tagging. Recent
approaches, such as In-Context Learning (ICL), which has been shown to be
effective for a number of downstream NLP tasks, require the use of labeled
examples. In this work, we adopt ICL strategy by employing the pretrained
knowledge of Large Language Models (LLMs), gathered during the pretraining
phase of an LLM, to automatically extract the PICO-related terminologies from
clinical trial documents in unsupervised set up to bypass the availability of
large number of annotated data instances. Additionally, to showcase the highest
effectiveness of LLM in oracle scenario where large number of annotated samples
are available, we adopt the instruction tuning strategy by employing Low Rank
Adaptation (LORA) to conduct the training of gigantic model in low resource
environment for the PICO frame extraction task. Our empirical results show that
our proposed ICL-based framework produces comparable results on all the version
of EBM-NLP datasets and the proposed instruction tuned version of our framework
produces state-of-the-art results on all the different EBM-NLP datasets. Our
project is available at \url{https://github.com/shrimonmuke0202/AlpaPICO.git}.