CFI-ViT: A coarse-to-fine inference based vision transformer for gastric cancer subtype detection using pathological images

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL Biomedical Signal Processing and Control Pub Date : 2024-11-06 DOI:10.1016/j.bspc.2024.107160

Xinghang Wang , Haibo Tao , Bin Wang , Huaiping Jin , Zhenhui Li

{"title":"CFI-ViT: A coarse-to-fine inference based vision transformer for gastric cancer subtype detection using pathological images","authors":"Xinghang Wang , Haibo Tao , Bin Wang , Huaiping Jin , Zhenhui Li","doi":"10.1016/j.bspc.2024.107160","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate detection of histopathological cancer subtypes is crucial for personalized treatment. Currently, deep learning methods based on histopathology images have become an effective solution to this problem. However, existing deep learning methods for histopathology image classification often suffer from high computational complexity, not considering the variability of different regions, and failing to synchronize the focus on local–global information effectively. To address these issues, we propose a coarse-to-fine inference based vision transformer (ViT) network (CFI-ViT) for pathological image detection of gastric cancer subtypes. CFI-ViT combines global attention and discriminative and differentiable modules to achieve two-stage inference. In the coarse inference stage, a ViT model with relative position embedding is employed to extract global information from the input images. If the critical information is not sufficiently identified, the differentiable module is adopted to extract local image regions with discrimination for fine-grained screening in the fine inference stage. The effectiveness and superiority of the proposed CFI-ViT method have been validated through three pathological image datasets of gastric cancer, including one private dataset clinically collected from Yunnan Cancer Hospital in China and two publicly available datasets, i.e., HE-GHI-DS and TCGA-STAD. The experimental results demonstrate that CFI-ViT achieves superior recognition accuracy and generalization performance compared to traditional methods, while using only 80 % of the computational resources required by the ViT model.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"100 ","pages":"Article 107160"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809424012187","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate detection of histopathological cancer subtypes is crucial for personalized treatment. Currently, deep learning methods based on histopathology images have become an effective solution to this problem. However, existing deep learning methods for histopathology image classification often suffer from high computational complexity, not considering the variability of different regions, and failing to synchronize the focus on local–global information effectively. To address these issues, we propose a coarse-to-fine inference based vision transformer (ViT) network (CFI-ViT) for pathological image detection of gastric cancer subtypes. CFI-ViT combines global attention and discriminative and differentiable modules to achieve two-stage inference. In the coarse inference stage, a ViT model with relative position embedding is employed to extract global information from the input images. If the critical information is not sufficiently identified, the differentiable module is adopted to extract local image regions with discrimination for fine-grained screening in the fine inference stage. The effectiveness and superiority of the proposed CFI-ViT method have been validated through three pathological image datasets of gastric cancer, including one private dataset clinically collected from Yunnan Cancer Hospital in China and two publicly available datasets, i.e., HE-GHI-DS and TCGA-STAD. The experimental results demonstrate that CFI-ViT achieves superior recognition accuracy and generalization performance compared to traditional methods, while using only 80 % of the computational resources required by the ViT model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CFI-ViT：利用病理图像进行胃癌亚型检测的基于粗到细推理的视觉变换器

准确检测组织病理学癌症亚型对于个性化治疗至关重要。目前，基于组织病理学图像的深度学习方法已成为解决这一问题的有效方法。然而，现有的组织病理学图像分类深度学习方法往往存在计算复杂度高、未考虑不同区域的差异性、无法有效同步关注局部和全局信息等问题。为了解决这些问题，我们提出了一种基于视觉变换器（ViT）的粗到细推理网络（CFI-ViT），用于胃癌亚型的病理图像检测。CFI-ViT 结合了全局注意力、判别和可微分模块，实现了两阶段推理。在粗推理阶段，采用具有相对位置嵌入的 ViT 模型从输入图像中提取全局信息。如果关键信息识别不充分，则在精细推理阶段采用可微分模块提取具有区分度的局部图像区域，进行细粒度筛选。我们通过三个胃癌病理图像数据集验证了 CFI-ViT 方法的有效性和优越性，其中包括一个从中国云南省肿瘤医院临床收集的私有数据集和两个公开数据集，即 HE-GHI-DS 和 TCGA-STAD。实验结果表明，与传统方法相比，CFI-ViT 获得了更高的识别准确率和泛化性能，而所需的计算资源仅为 ViT 模型的 80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biomedical Signal Processing and Control 工程技术-工程：生物医学

CiteScore

9.80

自引率

13.70%

发文量

822

审稿时长

4 months

期刊介绍： Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.