Manahil Raza, Ruqayya Awan, Raja Muhammad Saad Bashir, Talha Qaiser, Nasir M Rajpoot
{"title":"Dual attention model with reinforcement learning for classification of histology whole-slide images.","authors":"Manahil Raza, Ruqayya Awan, Raja Muhammad Saad Bashir, Talha Qaiser, Nasir M Rajpoot","doi":"10.1016/j.compmedimag.2024.102466","DOIUrl":null,"url":null,"abstract":"<p><p>Digital whole slide images (WSIs) are generally captured at microscopic resolution and encompass extensive spatial data (several billions of pixels per image). Directly feeding these images to deep learning models is computationally intractable due to memory constraints, while downsampling the WSIs risks incurring information loss. Alternatively, splitting the WSIs into smaller patches (or tiles) may result in a loss of important contextual information. In this paper, we propose a novel dual attention approach, consisting of two main components, both inspired by the visual examination process of a pathologist: The first soft attention model processes a low magnification view of the WSI to identify relevant regions of interest (ROIs), followed by a custom sampling method to extract diverse and spatially distinct image tiles from the selected ROIs. The second component, the hard attention classification model further extracts a sequence of multi-resolution glimpses from each tile for classification. Since hard attention is non-differentiable, we train this component using reinforcement learning to predict the location of the glimpses. This approach allows the model to focus on essential regions instead of processing the entire tile, thereby aligning with a pathologist's way of diagnosis. The two components are trained in an end-to-end fashion using a joint loss function to demonstrate the efficacy of the model. The proposed model was evaluated on two WSI-level classification problems: Human epidermal growth factor receptor 2 (HER2) scoring on breast cancer histology images and prediction of Intact/Loss status of two Mismatch Repair (MMR) biomarkers from colorectal cancer histology images. We show that the proposed model achieves performance better than or comparable to the state-of-the-art methods while processing less than 10% of the WSI at the highest magnification and reducing the time required to infer the WSI-level label by more than 75%. The code is available at github.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"118 ","pages":"102466"},"PeriodicalIF":5.4000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compmedimag.2024.102466","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Digital whole slide images (WSIs) are generally captured at microscopic resolution and encompass extensive spatial data (several billions of pixels per image). Directly feeding these images to deep learning models is computationally intractable due to memory constraints, while downsampling the WSIs risks incurring information loss. Alternatively, splitting the WSIs into smaller patches (or tiles) may result in a loss of important contextual information. In this paper, we propose a novel dual attention approach, consisting of two main components, both inspired by the visual examination process of a pathologist: The first soft attention model processes a low magnification view of the WSI to identify relevant regions of interest (ROIs), followed by a custom sampling method to extract diverse and spatially distinct image tiles from the selected ROIs. The second component, the hard attention classification model further extracts a sequence of multi-resolution glimpses from each tile for classification. Since hard attention is non-differentiable, we train this component using reinforcement learning to predict the location of the glimpses. This approach allows the model to focus on essential regions instead of processing the entire tile, thereby aligning with a pathologist's way of diagnosis. The two components are trained in an end-to-end fashion using a joint loss function to demonstrate the efficacy of the model. The proposed model was evaluated on two WSI-level classification problems: Human epidermal growth factor receptor 2 (HER2) scoring on breast cancer histology images and prediction of Intact/Loss status of two Mismatch Repair (MMR) biomarkers from colorectal cancer histology images. We show that the proposed model achieves performance better than or comparable to the state-of-the-art methods while processing less than 10% of the WSI at the highest magnification and reducing the time required to infer the WSI-level label by more than 75%. The code is available at github.
数字全切片图像(WSI)通常以显微镜分辨率捕获,包含大量空间数据(每幅图像数十亿像素)。由于内存限制,直接将这些图像输入深度学习模型在计算上难以实现,而对 WSIs 进行下采样则有可能导致信息丢失。另外,将 WSIs 分割成更小的斑块(或瓦片)可能会导致重要的上下文信息丢失。在本文中,我们提出了一种新颖的双重注意力方法,由两个主要部分组成,灵感均来自病理学家的视觉检查过程:第一个软注意力模型处理 WSI 的低倍视图,以识别相关的感兴趣区(ROI),然后采用自定义采样方法,从选定的 ROI 中提取不同的、空间上截然不同的图像块。第二个组件是硬注意力分类模型,它进一步从每个瓦片中提取多分辨率瞥视序列进行分类。由于硬注意力是无差别的,因此我们使用强化学习来预测瞥见的位置,从而对该组件进行训练。这种方法可以让模型专注于重要区域,而不是处理整个瓦片,从而与病理学家的诊断方法保持一致。使用联合损失函数对这两个组件进行端对端训练,以证明模型的有效性。我们在两个 WSI 级分类问题上对所提出的模型进行了评估:乳腺癌组织学图像上的人类表皮生长因子受体 2(HER2)评分,以及从结直肠癌组织学图像中预测两个错配修复(MMR)生物标记物的完好/丢失状态。我们的研究表明,所提出的模型性能优于或可与最先进的方法相媲美,同时在最高放大倍率下处理的 WSI 不到 10%,并将推断 WSI 级标签所需的时间减少了 75% 以上。代码可在 github 上获取。
期刊介绍:
The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.