BioInstruct: instruction tuning of large language models for biomedical natural language processing.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2024-09-01 DOI:10.1093/jamia/ocae122

Hieu Tran, Zhichao Yang, Zonghai Yao, Hong Yu

{"title":"BioInstruct: instruction tuning of large language models for biomedical natural language processing.","authors":"Hieu Tran, Zhichao Yang, Zonghai Yao, Hong Yu","doi":"10.1093/jamia/ocae122","DOIUrl":null,"url":null,"abstract":"Objectives: To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.Materials and methods: We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.Results and discussion: Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.Conclusion: The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1821-1832"},"PeriodicalIF":4.7000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339494/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae122","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.

Materials and methods: We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.

Results and discussion: Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.

Conclusion: The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BioInstruct：生物医学自然语言处理大型语言模型的指令调整。

目标：在生物医学自然语言处理（BioNLP）中，通过引入特定领域的指令数据集，提高大型语言模型（LLM）的性能，并研究其与多任务学习原理相结合的影响：我们创建了 BioInstruct，其中包括 25 005 条指令，用于指令调整 LLM（LLaMA 1 和 2、7B 和 13B 版本）。这些指令是通过提示 GPT-4 语言模型，并从 80 个人类编写的指令中随机抽取 3 个种子样本创建的。我们采用了低库适应（Low-Rank Adaptation，LoRA）技术进行参数高效微调。然后，我们在多个生物 NLP 任务中对这些经过指令调整的 LLM 进行了评估，这些任务可分为三大类：问题解答（QA）、信息提取（IE）和文本生成（GEN）。我们还研究了指令类别（如 QA、IE 和生成）是否会影响模型性能：与没有经过指令调整的 LLM 相比，我们经过指令调整的 LLM 的性能显著提高：在平均准确度指标上，QA 的性能提高了 17.3%；在平均 F1 指标上，IE 的性能提高了 5.7%；在平均 GPT-4 分数指标上，生成任务的性能提高了 96%。我们的经过 7B 参数指令调整的 LLaMA 1 模型在生物医学领域具有竞争力，甚至超过了其他 LLM，这些 LLM 也是通过大量特定领域数据或各种任务对 LLaMA 1 进行微调而成的。我们的研究结果还表明，当使用密切相关的任务进行指令微调时，性能增益明显更高。我们的研究结果与多任务学习的观察结果一致，表明了两个任务之间的协同作用：结论：BioInstruct 数据集是一个宝贵的资源，经过指令微调的 LLM 可以产生性能最佳的 BioNLP 应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.