CIMIL-CRC: A clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H&E stained images

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer methods and programs in biomedicine Pub Date : 2024-11-19 DOI:10.1016/j.cmpb.2024.108513

Hadar Hezi , Matan Gelber , Alexander Balabanov , Yosef E. Maruvka , Moti Freiman

{"title":"CIMIL-CRC: A clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H&E stained images","authors":"Hadar Hezi , Matan Gelber , Alexander Balabanov , Yosef E. Maruvka , Moti Freiman","doi":"10.1016/j.cmpb.2024.108513","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective:</h3><div>Treatment approaches for colorectal cancer (CRC) are highly dependent on the molecular subtype, as immunotherapy has shown efficacy in cases with microsatellite instability (MSI) but is ineffective for the microsatellite stable (MSS) subtype. There is promising potential in utilizing deep neural networks (DNNs) to automate the differentiation of CRC subtypes by analyzing hematoxylin and eosin (H&E) stained whole-slide images (WSIs). Due to the extensive size of WSIs, multiple instance learning (MIL) techniques are typically explored. However, existing MIL methods focus on identifying the most representative image patches for classification, which may result in the loss of critical information. Additionally, these methods often overlook clinically relevant information, like the tendency for MSI class tumors to predominantly occur on the proximal (right side) colon.</div></div><div><h3>Methods:</h3><div>We introduce ‘CIMIL-CRC’, a DNN framework that: (1) solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches, and (2) integrates clinical priors, particularly the tumor location within the colon, into the model to enhance patient-level classification accuracy. We assessed our CIMIL-CRC method using the average area under the receiver operating characteristic curve (AUROC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort, contrasting it with a baseline patch-level classification, a MIL-only approach, and a clinically-informed patch-level classification approach.</div></div><div><h3>Results:</h3><div>Our CIMIL-CRC outperformed all methods (AUROC: <span><math><mrow><mn>0</mn><mo>.</mo><mn>92</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>002</mn></mrow></math></span> (95% CI 0.91–0.92), vs. <span><math><mrow><mn>0</mn><mo>.</mo><mn>79</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>02</mn></mrow></math></span> (95% CI 0.76–0.82), <span><math><mrow><mn>0</mn><mo>.</mo><mn>86</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>01</mn></mrow></math></span> (95% CI 0.85–0.88), and <span><math><mrow><mn>0</mn><mo>.</mo><mn>87</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>01</mn></mrow></math></span> (95% CI 0.86–0.88), respectively). The improvement was statistically significant. To the best of our knowledge, this is the best result achieved for MSI/MSS classification on this dataset.</div></div><div><h3>Conclusion:</h3><div>Our CIMIL-CRC method holds promise for offering insights into the key representations of histopathological images and suggests a straightforward implementation.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"259 ","pages":"Article 108513"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724005066","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objective:

Treatment approaches for colorectal cancer (CRC) are highly dependent on the molecular subtype, as immunotherapy has shown efficacy in cases with microsatellite instability (MSI) but is ineffective for the microsatellite stable (MSS) subtype. There is promising potential in utilizing deep neural networks (DNNs) to automate the differentiation of CRC subtypes by analyzing hematoxylin and eosin (H&E) stained whole-slide images (WSIs). Due to the extensive size of WSIs, multiple instance learning (MIL) techniques are typically explored. However, existing MIL methods focus on identifying the most representative image patches for classification, which may result in the loss of critical information. Additionally, these methods often overlook clinically relevant information, like the tendency for MSI class tumors to predominantly occur on the proximal (right side) colon.

Methods:

We introduce ‘CIMIL-CRC’, a DNN framework that: (1) solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches, and (2) integrates clinical priors, particularly the tumor location within the colon, into the model to enhance patient-level classification accuracy. We assessed our CIMIL-CRC method using the average area under the receiver operating characteristic curve (AUROC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort, contrasting it with a baseline patch-level classification, a MIL-only approach, and a clinically-informed patch-level classification approach.

Results:

Our CIMIL-CRC outperformed all methods (AUROC:

0.92 \pm 0.002

(95% CI 0.91–0.92), vs.

0.79 \pm 0.02

(95% CI 0.76–0.82),

0.86 \pm 0.01

(95% CI 0.85–0.88), and

0.87 \pm 0.01

(95% CI 0.86–0.88), respectively). The improvement was statistically significant. To the best of our knowledge, this is the best result achieved for MSI/MSS classification on this dataset.

Conclusion:

Our CIMIL-CRC method holds promise for offering insights into the key representations of histopathological images and suggests a straightforward implementation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CIMIL-CRC：从 H&E 染色图像进行患者级别结直肠癌分子亚型分类的临床信息多实例学习框架

背景和目的：结直肠癌（CRC）的治疗方法在很大程度上取决于分子亚型，因为免疫疗法对微卫星不稳定（MSI）病例有疗效，但对微卫星稳定（MSS）亚型无效。通过分析苏木精和伊红（H&E）染色的全切片图像（WSI），利用深度神经网络（DNN）自动分辨 CRC 亚型具有广阔的前景。由于 WSIs 体积庞大，通常需要探索多实例学习 (MIL) 技术。然而，现有的多实例学习方法侧重于识别最具代表性的图像片段进行分类，这可能会导致关键信息的丢失。此外，这些方法往往忽略了与临床相关的信息，如 MSI 类肿瘤主要发生在结肠近端（右侧）的趋势：(方法：我们介绍了 "CIMIL-CRC"，这是一种 DNN 框架，它：（1）通过将预先训练的特征提取模型与主成分分析（PCA）有效结合，汇总来自所有斑块的信息，从而解决 MSI/MSS MIL 问题；（2）将临床先验（尤其是结肠内的肿瘤位置）整合到模型中，以提高患者级别的分类准确性。我们利用在 TCGA-CRC-DX 队列中进行模型开发的 5 倍交叉验证实验设置得出的接收者操作特征曲线下的平均面积（AUROC）评估了我们的 CIMIL-CRC 方法，并将其与基线斑块级分类、纯 MIL 方法和临床信息斑块级分类方法进行了对比。结果：我们的 CIMIL-CRC 优于所有方法（AUROC：0.92±0.002 (95% CI 0.91-0.92) vs. 0.79±0.02 (95% CI 0.76-0.82), 0.86±0.01 (95% CI 0.85-0.88), and 0.87±0.01 (95% CI 0.86-0.88)）。这些改善在统计学上具有重要意义。结论：我们的 CIMIL-CRC 方法有望为组织病理学图像的关键表征提供见解，并建议直接实施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.