CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation

IF 3 4区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC International Journal of Imaging Systems and Technology Pub Date : 2024-09-21 DOI:10.1002/ima.23184

Junding Sun, Xiaopeng Zheng, Xiaosheng Wu, Chaosheng Tang, Shuihua Wang, Yudong Zhang

{"title":"CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation","authors":"Junding Sun, Xiaopeng Zheng, Xiaosheng Wu, Chaosheng Tang, Shuihua Wang, Yudong Zhang","doi":"10.1002/ima.23184","DOIUrl":null,"url":null,"abstract":"<p>Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 5","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23184","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.23184","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CasUNeXt：用于医学图像分割的具有尺度内和尺度间信息的级联变换器

由于 Transformer 能够通过自我关注捕捉长距离依赖关系，因此在医学图像分割方面显示出巨大的潜力。然而，它缺乏对像素间局部关系建模的能力。因此，以前的许多方法都将变换器嵌入到 CNN 编码器中。然而，目前的方法往往无法模拟多尺度特征之间的关系，特别是不同尺度特征之间的空间对应关系。这种局限性会导致无法有效捕捉每个物体的尺度差异，以及丢失小目标的特征。此外，由于变换器的高复杂性，在同一尺度内有效整合局部和全局信息也是一项挑战。为了解决这些局限性，我们提出了一种名为 CasUNeXt 的新型骨干网络，它具有三个吸引人的设计元素：(1) 我们利用级联的思想重新设计了 CNN 和 Transformer 的组合方式，以加强对多尺度信息之间独特相互关系的建模。(2) 我们设计了一个能够进行跨尺度交互的级联尺度变换器模块。它不仅能加强单一尺度内的特征提取，还能模拟不同尺度之间的交互。(3) 我们彻底改变了多头通道关注机制，使其能够在通道维度内从多个角度对特征图中的上下文信息进行建模。这些设计特点使 CasUNeXt 能够更好地整合局部和全局信息，捕捉多尺度特征之间的关系，从而提高医学图像分割的性能。通过在各种基准数据集上的实验比较，我们的 CasUNeXt 方法在医学图像分割任务中表现出卓越的性能，超越了目前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Imaging Systems and Technology 工程技术-成像科学与照相技术

CiteScore

6.90

自引率

6.10%

发文量

138

审稿时长

3 months

期刊介绍： The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals. IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging. The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered. The scope of the journal includes, but is not limited to, the following in the context of biomedical research: Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.; Neuromodulation and brain stimulation techniques such as TMS and tDCS; Software and hardware for imaging, especially related to human and animal health; Image segmentation in normal and clinical populations; Pattern analysis and classification using machine learning techniques; Computational modeling and analysis; Brain connectivity and connectomics; Systems-level characterization of brain function; Neural networks and neurorobotics; Computer vision, based on human/animal physiology; Brain-computer interface (BCI) technology; Big data, databasing and data mining.