When multiple instance learning meets foundation models: Advancing histological whole slide image analysis

IF 11.8 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Medical image analysis Pub Date : 2025-01-14 DOI:10.1016/j.media.2025.103456

Hongming Xu , Mingkang Wang , Duanbo Shi , Huamin Qin , Yunpeng Zhang , Zaiyi Liu , Anant Madabhushi , Peng Gao , Fengyu Cong , Cheng Lu

{"title":"When multiple instance learning meets foundation models: Advancing histological whole slide image analysis","authors":"Hongming Xu , Mingkang Wang , Duanbo Shi , Huamin Qin , Yunpeng Zhang , Zaiyi Liu , Anant Madabhushi , Peng Gao , Fengyu Cong , Cheng Lu","doi":"10.1016/j.media.2025.103456","DOIUrl":null,"url":null,"abstract":"<div><div>Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types. We tested state-of-the-art (SOTA) FMs in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on cancer grading, biomarker status prediction, and microsatellite instability (MSI) prediction suggest that (1) FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed, (2) instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance, (3) FMs advance MIL models by enabling promising grading classifications, biomarker status, and MSI predictions without requiring pixel- or patch-level annotations. These findings encourage the development of advanced, domain-specific FMs, aimed at more universally applicable diagnostic tasks, aligning with the evolving needs of clinical AI in pathology.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"Article 103456"},"PeriodicalIF":11.8000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525000040","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types. We tested state-of-the-art (SOTA) FMs in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on cancer grading, biomarker status prediction, and microsatellite instability (MSI) prediction suggest that (1) FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed, (2) instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance, (3) FMs advance MIL models by enabling promising grading classifications, biomarker status, and MSI predictions without requiring pixel- or patch-level annotations. These findings encourage the development of advanced, domain-specific FMs, aimed at more universally applicable diagnostic tasks, aligning with the evolving needs of clinical AI in pathology.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

当多实例学习满足基础模型时：推进组织整体幻灯片图像分析。

深度多实例学习（MIL）管道是全幻灯片图像分类的主流弱监督学习方法。然而，考虑到最近用于补丁级嵌入的基础模型（FMs）和滑动级聚合的多样性，这些广泛使用的方法如何相互比较仍然不清楚。本文利用来自4044名不同癌症类型的患者的wsi，在7个临床相关的端到端预测任务中组织不同的特征提取和聚合，实现并系统地比较了6种FMs和6种最新的MIL方法。我们在计算病理学中测试了最先进的（SOTA） FMs，包括CTransPath， PathoDuet， PLIP， CONCH和UNI，作为斑块级特征提取器。特征聚合器，如基于注意力的池、变压器和动态图都经过了彻底的测试。我们在癌症分级、生物标志物状态预测和微卫星不稳定性（MSI）预测方面的实验表明：(1)像UNI这样用更多样化的组织图像训练的模型，在贴片嵌入方面优于使用更小训练数据集的通用模型，显著提高了下游MIL分类精度和模型训练收敛速度；(2)实例特征微调，即在线特征重新嵌入；捕获细粒度细节和空间相互作用通常可以进一步提高WSI分类性能。(3)FMs通过实现有前途的分级分类、生物标志物状态和MSI预测，而不需要像素级或补丁级注释，从而推进MSI模型。这些发现鼓励了先进的、特定领域的FMs的发展，旨在实现更普遍适用的诊断任务，与病理临床人工智能不断发展的需求保持一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.