Xinghang Wang , Haibo Tao , Bin Wang , Huaiping Jin , Zhenhui Li
{"title":"CFI-ViT: A coarse-to-fine inference based vision transformer for gastric cancer subtype detection using pathological images","authors":"Xinghang Wang , Haibo Tao , Bin Wang , Huaiping Jin , Zhenhui Li","doi":"10.1016/j.bspc.2024.107160","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate detection of histopathological cancer subtypes is crucial for personalized treatment. Currently, deep learning methods based on histopathology images have become an effective solution to this problem. However, existing deep learning methods for histopathology image classification often suffer from high computational complexity, not considering the variability of different regions, and failing to synchronize the focus on local–global information effectively. To address these issues, we propose a coarse-to-fine inference based vision transformer (ViT) network (CFI-ViT) for pathological image detection of gastric cancer subtypes. CFI-ViT combines global attention and discriminative and differentiable modules to achieve two-stage inference. In the coarse inference stage, a ViT model with relative position embedding is employed to extract global information from the input images. If the critical information is not sufficiently identified, the differentiable module is adopted to extract local image regions with discrimination for fine-grained screening in the fine inference stage. The effectiveness and superiority of the proposed CFI-ViT method have been validated through three pathological image datasets of gastric cancer, including one private dataset clinically collected from Yunnan Cancer Hospital in China and two publicly available datasets, i.e., HE-GHI-DS and TCGA-STAD. The experimental results demonstrate that CFI-ViT achieves superior recognition accuracy and generalization performance compared to traditional methods, while using only 80 % of the computational resources required by the ViT model.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"100 ","pages":"Article 107160"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809424012187","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate detection of histopathological cancer subtypes is crucial for personalized treatment. Currently, deep learning methods based on histopathology images have become an effective solution to this problem. However, existing deep learning methods for histopathology image classification often suffer from high computational complexity, not considering the variability of different regions, and failing to synchronize the focus on local–global information effectively. To address these issues, we propose a coarse-to-fine inference based vision transformer (ViT) network (CFI-ViT) for pathological image detection of gastric cancer subtypes. CFI-ViT combines global attention and discriminative and differentiable modules to achieve two-stage inference. In the coarse inference stage, a ViT model with relative position embedding is employed to extract global information from the input images. If the critical information is not sufficiently identified, the differentiable module is adopted to extract local image regions with discrimination for fine-grained screening in the fine inference stage. The effectiveness and superiority of the proposed CFI-ViT method have been validated through three pathological image datasets of gastric cancer, including one private dataset clinically collected from Yunnan Cancer Hospital in China and two publicly available datasets, i.e., HE-GHI-DS and TCGA-STAD. The experimental results demonstrate that CFI-ViT achieves superior recognition accuracy and generalization performance compared to traditional methods, while using only 80 % of the computational resources required by the ViT model.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.