疾病诊断辅助的通才医学语言模型

IF 58.7 1区医学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Nature Medicine Pub Date : 2025-01-08 DOI:10.1038/s41591-024-03416-6

Xiaohong Liu, Hao Liu, Guoxing Yang, Zeyu Jiang, Shuguang Cui, Zhaoze Zhang, Huan Wang, Liyuan Tao, Yongchang Sun, Zhu Song, Tianpei Hong, Jin Yang, Tianrun Gao, Jiangjiang Zhang, Xiaohu Li, Jing Zhang, Ye Sang, Zhao Yang, Kanmin Xue, Song Wu, Ping Zhang, Jian Yang, Chunli Song, Guangyu Wang

{"title":"疾病诊断辅助的通才医学语言模型","authors":"Xiaohong Liu, Hao Liu, Guoxing Yang, Zeyu Jiang, Shuguang Cui, Zhaoze Zhang, Huan Wang, Liyuan Tao, Yongchang Sun, Zhu Song, Tianpei Hong, Jin Yang, Tianrun Gao, Jiangjiang Zhang, Xiaohu Li, Jing Zhang, Ye Sang, Zhao Yang, Kanmin Xue, Song Wu, Ping Zhang, Jian Yang, Chunli Song, Guangyu Wang","doi":"10.1038/s41591-024-03416-6","DOIUrl":null,"url":null,"abstract":"<p>The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven. Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians’ inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management. Our findings demonstrate the model’s feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.</p>","PeriodicalId":19037,"journal":{"name":"Nature Medicine","volume":"82 1","pages":""},"PeriodicalIF":58.7000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A generalist medical language model for disease diagnosis assistance\",\"authors\":\"Xiaohong Liu, Hao Liu, Guoxing Yang, Zeyu Jiang, Shuguang Cui, Zhaoze Zhang, Huan Wang, Liyuan Tao, Yongchang Sun, Zhu Song, Tianpei Hong, Jin Yang, Tianrun Gao, Jiangjiang Zhang, Xiaohu Li, Jing Zhang, Ye Sang, Zhao Yang, Kanmin Xue, Song Wu, Ping Zhang, Jian Yang, Chunli Song, Guangyu Wang\",\"doi\":\"10.1038/s41591-024-03416-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven. Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians’ inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management. Our findings demonstrate the model’s feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.</p>\",\"PeriodicalId\":19037,\"journal\":{\"name\":\"Nature Medicine\",\"volume\":\"82 1\",\"pages\":\"\"},\"PeriodicalIF\":58.7000,\"publicationDate\":\"2025-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1038/s41591-024-03416-6\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41591-024-03416-6","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

提供准确的诊断在医疗保健中至关重要，并代表了适当和及时治疗的门户。尽管最近的大型语言模型（llm）在少量或零次学习中表现出了令人印象深刻的能力，但它们在临床诊断中的有效性仍未得到证实。在这里，我们提出MedFound，一个具有1760亿个参数的通用医学语言模型，在来自不同医学文本和现实世界临床记录的大规模语料库上进行预训练。我们进一步对MedFound进行了微调，通过基于自我引导策略的思维链方法来学习医生的推理诊断，并引入了统一的偏好对齐框架，使其与标准临床实践保持一致。广泛的实验表明，我们的医学LLM在分布内（常见疾病）、分布外（外部验证）和长尾分布（罕见疾病）场景中优于其他基线LLM和专业模型。进一步的消融研究表明，我们的医学法学硕士培训方法的关键组成部分是有效的。我们对llm用于诊断的临床适用性进行了全面评估，包括人工智能（AI）与医生比较、人工智能辅助研究和人类评估框架。我们提出的框架包含八个临床评估指标，涵盖了病历总结、诊断推理和风险管理等功能。我们的研究结果证明了该模型在协助医生进行疾病诊断作为临床工作流程的一部分的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A generalist medical language model for disease diagnosis assistance

The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven. Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians’ inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management. Our findings demonstrate the model’s feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Medicine 医学-生化与分子生物学

CiteScore

100.90

自引率

0.70%

发文量

525

审稿时长

1 months

期刊介绍： Nature Medicine is a monthly journal publishing original peer-reviewed research in all areas of medicine. The publication focuses on originality, timeliness, interdisciplinary interest, and the impact on improving human health. In addition to research articles, Nature Medicine also publishes commissioned content such as News, Reviews, and Perspectives. This content aims to provide context for the latest advances in translational and clinical research, reaching a wide audience of M.D. and Ph.D. readers. All editorial decisions for the journal are made by a team of full-time professional editors. Nature Medicine consider all types of clinical research, including: -Case-reports and small case series -Clinical trials, whether phase 1, 2, 3 or 4 -Observational studies -Meta-analyses -Biomarker studies -Public and global health studies Nature Medicine is also committed to facilitating communication between translational and clinical researchers. As such, we consider “hybrid” studies with preclinical and translational findings reported alongside data from clinical studies.

期刊最新文献

Plasma proteomic evidence for increased β-amyloid pathology after SARS-CoV-2 infection Estimating the true number of traumatic deaths in Gaza Recognizing ultra-processed foods in grocery stores Toward a biological definition of neuronal and glial synucleinopathies A roadmap for engagement and inclusion of LGBTQIA+ people in clinical research