从预训练到微调：深入分析生物医学领域的大型语言模型。

IF 6.1 2区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence in Medicine Pub Date : 2024-11-01 DOI:10.1016/j.artmed.2024.103003

Agnese Bonfigli , Luca Bacco , Mario Merone , Felice Dell’Orletta

{"title":"从预训练到微调：深入分析生物医学领域的大型语言模型。","authors":"Agnese Bonfigli , Luca Bacco , Mario Merone , Felice Dell’Orletta","doi":"10.1016/j.artmed.2024.103003","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we delve into the adaptation and effectiveness of Transformer-based, pre-trained Large Language Models (LLMs) within the biomedical domain, a field that poses unique challenges due to its complexity and the specialized nature of its data. Building on the foundation laid by the transformative architecture of Transformers, we investigate the nuanced dynamics of LLMs through a multifaceted lens, focusing on two domain-specific tasks, i.e., Natural Language Inference (NLI) and Named Entity Recognition (NER). Our objective is to bridge the knowledge gap regarding how these models’ downstream performances correlate with their capacity to encapsulate task-relevant information. To achieve this goal, we probed and analyzed the inner encoding and attention mechanisms in LLMs, both encoder- and decoder-based, tailored for either general or biomedical-specific applications. This examination occurs before and after the models are fine-tuned across various data volumes. Our findings reveal that the models’ downstream effectiveness is intricately linked to specific patterns within their internal mechanisms, shedding light on the nuanced ways in which LLMs process and apply knowledge in the biomedical context. The source code for this paper is available at <span><span>https://github.com/agnesebonfigli99/LLMs-in-the-Biomedical-Domain</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"157 ","pages":"Article 103003"},"PeriodicalIF":6.1000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain\",\"authors\":\"Agnese Bonfigli , Luca Bacco , Mario Merone , Felice Dell’Orletta\",\"doi\":\"10.1016/j.artmed.2024.103003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this study, we delve into the adaptation and effectiveness of Transformer-based, pre-trained Large Language Models (LLMs) within the biomedical domain, a field that poses unique challenges due to its complexity and the specialized nature of its data. Building on the foundation laid by the transformative architecture of Transformers, we investigate the nuanced dynamics of LLMs through a multifaceted lens, focusing on two domain-specific tasks, i.e., Natural Language Inference (NLI) and Named Entity Recognition (NER). Our objective is to bridge the knowledge gap regarding how these models’ downstream performances correlate with their capacity to encapsulate task-relevant information. To achieve this goal, we probed and analyzed the inner encoding and attention mechanisms in LLMs, both encoder- and decoder-based, tailored for either general or biomedical-specific applications. This examination occurs before and after the models are fine-tuned across various data volumes. Our findings reveal that the models’ downstream effectiveness is intricately linked to specific patterns within their internal mechanisms, shedding light on the nuanced ways in which LLMs process and apply knowledge in the biomedical context. The source code for this paper is available at <span><span>https://github.com/agnesebonfigli99/LLMs-in-the-Biomedical-Domain</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"157 \",\"pages\":\"Article 103003\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365724002458\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724002458","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在本研究中，我们深入探讨了基于 Transformer 的预训练大型语言模型（LLM）在生物医学领域的适应性和有效性，该领域因其数据的复杂性和专业性而面临独特的挑战。在 Transformers 的转换架构所奠定的基础上，我们通过多角度视角研究 LLM 的细微动态，重点关注两个特定领域的任务，即自然语言推理（NLI）和命名实体识别（NER）。我们的目标是填补知识空白，了解这些模型的下游性能与其封装任务相关信息的能力之间的关系。为了实现这一目标，我们探究并分析了基于编码器和解码器的 LLM 中的内部编码和注意机制，这些 LLM 都是为一般应用或生物医学特定应用量身定制的。这种检查发生在对各种数据量的模型进行微调之前和之后。我们的研究结果表明，模型的下游效果与其内部机制的特定模式密切相关，从而揭示了 LLM 在生物医学环境中处理和应用知识的细微方式。本文的源代码可在 https://github.com/agnesebonfigli99/LLMs-in-the-Biomedical-Domain 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain

In this study, we delve into the adaptation and effectiveness of Transformer-based, pre-trained Large Language Models (LLMs) within the biomedical domain, a field that poses unique challenges due to its complexity and the specialized nature of its data. Building on the foundation laid by the transformative architecture of Transformers, we investigate the nuanced dynamics of LLMs through a multifaceted lens, focusing on two domain-specific tasks, i.e., Natural Language Inference (NLI) and Named Entity Recognition (NER). Our objective is to bridge the knowledge gap regarding how these models’ downstream performances correlate with their capacity to encapsulate task-relevant information. To achieve this goal, we probed and analyzed the inner encoding and attention mechanisms in LLMs, both encoder- and decoder-based, tailored for either general or biomedical-specific applications. This examination occurs before and after the models are fine-tuned across various data volumes. Our findings reveal that the models’ downstream effectiveness is intricately linked to specific patterns within their internal mechanisms, shedding light on the nuanced ways in which LLMs process and apply knowledge in the biomedical context. The source code for this paper is available at https://github.com/agnesebonfigli99/LLMs-in-the-Biomedical-Domain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence in Medicine 工程技术-工程：生物医学

CiteScore

15.00

自引率

2.70%

发文量

143

审稿时长

6.3 months

期刊介绍： Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.