使用临床记录预测术后风险的大型语言模型的基本能力

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES NPJ Digital Medicine Pub Date : 2025-02-11 DOI:10.1038/s41746-025-01489-2

Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu

{"title":"使用临床记录预测术后风险的大型语言模型的基本能力","authors":"Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu","doi":"10.1038/s41746-025-01489-2","DOIUrl":null,"url":null,"abstract":"<p>Clinical notes recorded during a patient’s perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 preoperative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"29 1","pages":""},"PeriodicalIF":15.1000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The foundational capabilities of large language models in predicting postoperative risks using clinical notes\",\"authors\":\"Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu\",\"doi\":\"10.1038/s41746-025-01489-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Clinical notes recorded during a patient’s perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 preoperative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care.</p>\",\"PeriodicalId\":19349,\"journal\":{\"name\":\"NPJ Digital Medicine\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":15.1000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NPJ Digital Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1038/s41746-025-01489-2\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Digital Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41746-025-01489-2","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

病人围手术期的临床记录具有巨大的信息价值。大型语言模型（llm）的进步为弥合这一差距提供了机会。使用2018年至2021年的84,875例术前记录及其相关手术病例，我们检查了llm在使用各种微调策略预测六种术后风险方面的表现。预训练的llm比传统词嵌入的AUROC和AUPRC分别高出38.3%和33.2%。自我监督的微调进一步将性能提高了3.2%和1.5%。将标签纳入培训后，AUROC和AUPRC分别提高了1.8%和2%。统一的基础模型取得了最高的表现，与自我监督相比，AUROC和AUPRC分别提高了3.6%和2.6%，突出了llm预测术后风险的基础能力，这在围手术期护理中可能是有益的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The foundational capabilities of large language models in predicting postoperative risks using clinical notes

Clinical notes recorded during a patient’s perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 preoperative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

NPJ Digital Medicine Multiple-

CiteScore

25.10

自引率

3.30%

发文量

170

审稿时长

15 weeks

期刊介绍： npj Digital Medicine is an online open-access journal that focuses on publishing peer-reviewed research in the field of digital medicine. The journal covers various aspects of digital medicine, including the application and implementation of digital and mobile technologies in clinical settings, virtual healthcare, and the use of artificial intelligence and informatics. The primary goal of the journal is to support innovation and the advancement of healthcare through the integration of new digital and mobile technologies. When determining if a manuscript is suitable for publication, the journal considers four important criteria: novelty, clinical relevance, scientific rigor, and digital innovation.