Multiple Instance Learning for WSI: A comparative analysis of attention-based approaches

Martim Afonso , Praphulla M.S. Bhawsar , Monjoy Saha , Jonas S. Almeida , Arlindo L. Oliveira
{"title":"Multiple Instance Learning for WSI: A comparative analysis of attention-based approaches","authors":"Martim Afonso ,&nbsp;Praphulla M.S. Bhawsar ,&nbsp;Monjoy Saha ,&nbsp;Jonas S. Almeida ,&nbsp;Arlindo L. Oliveira","doi":"10.1016/j.jpi.2024.100403","DOIUrl":null,"url":null,"abstract":"<div><div>Whole slide images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to artificial intelligence (AI)-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: (a) accurately predicting the overall cancer phenotype and (b) finding out what cellular morphologies are associated with it at the tile level. To better understand and address these challenges, two existing weakly supervised Multiple Instance Learning (MIL) approaches were explored and compared: Attention MIL (AMIL) and Additive MIL (AdMIL). These architectures were analyzed on tumor detection (a task where these models obtained good results previously) and TP53 mutation detection (a much less explored task). For tumor detection, we built a dataset from Lung Squamous Cell Carcinoma (TCGA-LUSC) slides, with 349 positive and 349 negative slides. The patches were extracted from 5× magnification. For TP53 mutation detection, we explored a dataset built from Invasive Breast Carcinoma (TCGA-BRCA) slides, with 347 positive and 347 negative slides. In this case, we explored three different magnification levels: 5×, 10×, and 20×. Our results show that a modified additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by AMIL (AUC 0.97) on the tumor detection task. TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved. More interestingly from the perspective of the molecular pathologist, we highlight the possible ability of these MIL architectures to identify distinct sensitivities to morphological features (through the detection of regions of interest, ROIs) at different amplification levels. This ability for models to obtain tile-level ROIs is very appealing to pathologists as it provides the possibility for these algorithms to be integrated in a digital staining application for analysis, facilitating the navigation through these high-dimensional images and the diagnostic process.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"15 ","pages":"Article 100403"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353924000427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Whole slide images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to artificial intelligence (AI)-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: (a) accurately predicting the overall cancer phenotype and (b) finding out what cellular morphologies are associated with it at the tile level. To better understand and address these challenges, two existing weakly supervised Multiple Instance Learning (MIL) approaches were explored and compared: Attention MIL (AMIL) and Additive MIL (AdMIL). These architectures were analyzed on tumor detection (a task where these models obtained good results previously) and TP53 mutation detection (a much less explored task). For tumor detection, we built a dataset from Lung Squamous Cell Carcinoma (TCGA-LUSC) slides, with 349 positive and 349 negative slides. The patches were extracted from 5× magnification. For TP53 mutation detection, we explored a dataset built from Invasive Breast Carcinoma (TCGA-BRCA) slides, with 347 positive and 347 negative slides. In this case, we explored three different magnification levels: 5×, 10×, and 20×. Our results show that a modified additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by AMIL (AUC 0.97) on the tumor detection task. TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved. More interestingly from the perspective of the molecular pathologist, we highlight the possible ability of these MIL architectures to identify distinct sensitivities to morphological features (through the detection of regions of interest, ROIs) at different amplification levels. This ability for models to obtain tile-level ROIs is very appealing to pathologists as it provides the possibility for these algorithms to be integrated in a digital staining application for analysis, facilitating the navigation through these high-dimensional images and the diagnostic process.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对 WSI 的多实例学习:基于注意力的方法比较分析
全玻片图像(WSI)是通过对显微镜玻片进行多尺度高分辨率数字扫描获得的,是现代数字病理学的基石。然而,它们对基于人工智能(AI)/人工智能介导的分析是一个特殊的挑战,因为病理标记通常是在玻片级而不是平片级完成的。医学诊断不仅记录在标本层面,肿瘤基因突变的检测也是通过实验获得的,并由癌症基因组图谱(TCGA)等计划记录在玻片层面。这就构成了双重挑战:(a)准确预测整体癌症表型;(b)在切片层面找出与之相关的细胞形态。为了更好地理解和应对这些挑战,我们探索并比较了两种现有的弱监督多实例学习 (MIL) 方法:注意力 MIL (AMIL) 和添加式 MIL (AdMIL)。我们在肿瘤检测(这些模型之前在这项任务中取得了很好的结果)和 TP53 突变检测(这是一项探索较少的任务)中对这些架构进行了分析。在肿瘤检测方面,我们建立了一个来自肺鳞状细胞癌(TCGA-LUSC)切片的数据集,其中包括 349 张阳性切片和 349 张阴性切片。斑块是从 5 倍放大镜下提取的。在 TP53 突变检测方面,我们利用侵袭性乳腺癌(TCGA-BRCA)切片建立了一个数据集,其中有 347 张阳性切片和 347 张阴性切片。在这种情况下,我们探索了三种不同的放大倍数:5 倍、10 倍和 20 倍。结果表明,在肿瘤检测任务上,MIL 的改进加法实现与参考实现的性能相当(AUC 0.96),仅略高于 AMIL(AUC 0.97)。TP53 突变对细胞形态解析度较高的应用特征最为敏感。从分子病理学家的角度来看,更有趣的是,我们强调了这些 MIL 架构在不同扩增水平下识别形态特征(通过检测感兴趣区,ROI)的不同敏感性的可能能力。模型获得瓦片级 ROI 的这种能力对病理学家来说非常有吸引力,因为它提供了将这些算法集成到数字染色应用中进行分析的可能性,从而为浏览这些高维图像和诊断过程提供了便利。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Pathology Informatics
Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine
CiteScore
3.70
自引率
0.00%
发文量
2
审稿时长
18 weeks
期刊介绍: The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.
期刊最新文献
Improving the generalizability of white blood cell classification with few-shot domain adaptation Pathology Informatics Summit 2024 Abstracts Ann Arbor Marriott at Eagle Crest Resort May 20-23, 2024 Ann Arbor, Michigan Deep learning-based classification of breast cancer molecular subtypes from H&E whole-slide images. Enhancing human phenotype ontology term extraction through synthetic case reports and embedding-based retrieval: A novel approach for improved biomedical data annotation. Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1