A vision–language foundation model for precision oncology

IF 48.5 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Nature Pub Date : 2025-01-08 DOI:10.1038/s41586-024-08378-w
Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yijiang Chen, Yuchen Li, Colin Bergstrom, Matthew Gopaulchan, Ted Kim, Kun-Hsing Yu, Sierra Willens, Francesca Maria Olguin, Jeffrey J. Nirschl, Joel Neal, Maximilian Diehn, Sen Yang, Ruijiang Li
{"title":"A vision–language foundation model for precision oncology","authors":"Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yijiang Chen, Yuchen Li, Colin Bergstrom, Matthew Gopaulchan, Ted Kim, Kun-Hsing Yu, Sierra Willens, Francesca Maria Olguin, Jeffrey J. Nirschl, Joel Neal, Maximilian Diehn, Sen Yang, Ruijiang Li","doi":"10.1038/s41586-024-08378-w","DOIUrl":null,"url":null,"abstract":"Clinical decision-making is driven by multimodal data, including clinical notes and pathological characteristics. Artificial intelligence approaches that can effectively integrate multimodal data hold significant promise in advancing clinical care1,2. However, the scarcity of well-annotated multimodal datasets in clinical settings has hindered the development of useful models. In this study, we developed the Multimodal transformer with Unified maSKed modeling (MUSK), a vision–language foundation model designed to leverage large-scale, unlabelled, unpaired image and text data. MUSK was pretrained on 50 million pathology images from 11,577 patients and one billion pathology-related text tokens using unified masked modelling. It was further pretrained on one million pathology image–text pairs to efficiently align the vision and language features. With minimal or no further training, MUSK was tested in a wide range of applications and demonstrated superior performance across 23 patch-level and slide-level benchmarks, including image-to-text and text-to-image retrieval, visual question answering, image classification and molecular biomarker prediction. Furthermore, MUSK showed strong performance in outcome prediction, including melanoma relapse prediction, pan-cancer prognosis prediction and immunotherapy response prediction in lung and gastro-oesophageal cancers. MUSK effectively combined complementary information from pathology images and clinical reports and could potentially improve diagnosis and precision in cancer therapy. Trained on unlabelled, unpaired image and text data, the Multimodal transformer with Unified maSKed modeling excelled in outcome prediction, image-to-text retrieval and visual question answering, potentially improving cancer diagnosis and therapy precision.","PeriodicalId":18787,"journal":{"name":"Nature","volume":"638 8051","pages":"769-778"},"PeriodicalIF":48.5000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature","FirstCategoryId":"103","ListUrlMain":"https://www.nature.com/articles/s41586-024-08378-w","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Clinical decision-making is driven by multimodal data, including clinical notes and pathological characteristics. Artificial intelligence approaches that can effectively integrate multimodal data hold significant promise in advancing clinical care1,2. However, the scarcity of well-annotated multimodal datasets in clinical settings has hindered the development of useful models. In this study, we developed the Multimodal transformer with Unified maSKed modeling (MUSK), a vision–language foundation model designed to leverage large-scale, unlabelled, unpaired image and text data. MUSK was pretrained on 50 million pathology images from 11,577 patients and one billion pathology-related text tokens using unified masked modelling. It was further pretrained on one million pathology image–text pairs to efficiently align the vision and language features. With minimal or no further training, MUSK was tested in a wide range of applications and demonstrated superior performance across 23 patch-level and slide-level benchmarks, including image-to-text and text-to-image retrieval, visual question answering, image classification and molecular biomarker prediction. Furthermore, MUSK showed strong performance in outcome prediction, including melanoma relapse prediction, pan-cancer prognosis prediction and immunotherapy response prediction in lung and gastro-oesophageal cancers. MUSK effectively combined complementary information from pathology images and clinical reports and could potentially improve diagnosis and precision in cancer therapy. Trained on unlabelled, unpaired image and text data, the Multimodal transformer with Unified maSKed modeling excelled in outcome prediction, image-to-text retrieval and visual question answering, potentially improving cancer diagnosis and therapy precision.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
精准肿瘤学的视觉语言基础模型
临床决策是由多模态数据驱动的,包括临床记录和病理特征。能够有效整合多模态数据的人工智能方法在推进临床护理方面具有重大前景1,2。然而,临床环境中缺乏良好注释的多模态数据集阻碍了有用模型的发展。在本研究中,我们开发了具有统一掩模建模(MUSK)的多模态变压器,这是一种视觉语言基础模型,旨在利用大规模,未标记,未配对的图像和文本数据。使用统一的掩模建模,对来自11577名患者的5000万张病理图像和10亿个病理相关文本标记进行了MUSK的预训练。进一步对100万病理图像-文本对进行预训练,有效地对齐视觉和语言特征。在很少或没有进一步培训的情况下,MUSK在广泛的应用中进行了测试,并在23个补丁级和幻灯片级基准测试中表现出卓越的性能,包括图像到文本和文本到图像检索、视觉问答、图像分类和分子生物标志物预测。此外,MUSK在预后预测方面表现出色,包括预测肺癌和胃食管癌的黑色素瘤复发、泛癌预后和免疫治疗反应预测。MUSK有效地结合了病理图像和临床报告的互补信息,有可能提高癌症治疗的诊断和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Nature
Nature 综合性期刊-综合性期刊
CiteScore
90.00
自引率
1.20%
发文量
3652
审稿时长
3 months
期刊介绍: Nature is a prestigious international journal that publishes peer-reviewed research in various scientific and technological fields. The selection of articles is based on criteria such as originality, importance, interdisciplinary relevance, timeliness, accessibility, elegance, and surprising conclusions. In addition to showcasing significant scientific advances, Nature delivers rapid, authoritative, insightful news, and interpretation of current and upcoming trends impacting science, scientists, and the broader public. The journal serves a dual purpose: firstly, to promptly share noteworthy scientific advances and foster discussions among scientists, and secondly, to ensure the swift dissemination of scientific results globally, emphasizing their significance for knowledge, culture, and daily life.
期刊最新文献
Magnetic muon measurements and gene-therapy advances win US$3 million Breakthrough prizes. US lawmakers intensify scrutiny of scientific-publishing practices. Immune cells have a surprising role in exercise endurance. Briefing Chat: Penguins pick up PFAS pollution. Ageing could prime women for autoimmune disorders.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1