MCPA: multi-scale cross perceptron attention network for 2D medical image segmentation

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2024-12-19 DOI:10.1007/s40747-024-01671-1

Liang Xu, Mingxiao Chen, Yi Cheng, Pengwu Song, Pengfei Shao, Shuwei Shen, Peng Yao, Ronald X. Xu

{"title":"MCPA: multi-scale cross perceptron attention network for 2D medical image segmentation","authors":"Liang Xu, Mingxiao Chen, Yi Cheng, Pengwu Song, Pengfei Shao, Shuwei Shen, Peng Yao, Ronald X. Xu","doi":"10.1007/s40747-024-01671-1","DOIUrl":null,"url":null,"abstract":"<p>The UNet architecture, based on convolutional neural networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. In this work, we propose a 2D medical image segmentation model called multi-scale cross perceptron attention network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Considering the high computational cost of using 3D neural network models, and the fact that many important clinical data can only be obtained in two dimensions, our MCPA focuses on 2D medical image segmentation. Furthermore, we introduce a progressive dual-branch structure (PDBS) to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), and widely used 2D medical imaging datasets captured by fundus camera (DRIVE, CHASE<span>\\(\\_\\)</span>DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01671-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The UNet architecture, based on convolutional neural networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. In this work, we propose a 2D medical image segmentation model called multi-scale cross perceptron attention network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Considering the high computational cost of using 3D neural network models, and the fact that many important clinical data can only be obtained in two dimensions, our MCPA focuses on 2D medical image segmentation. Furthermore, we introduce a progressive dual-branch structure (PDBS) to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), and widely used 2D medical imaging datasets captured by fundus camera (DRIVE, CHASE\(\_\)DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于二维医学图像分割的多尺度交叉感知器注意网络

基于卷积神经网络（CNN）的 UNet 架构在医学图像分析中表现出了卓越的性能。然而，由于卷积操作的感受野有限且存在固有偏差，该架构在捕捉长距离相关性方面面临挑战。最近，许多基于变换器的技术被纳入 UNet 架构，通过有效捕捉全局特征相关性来克服这一局限。然而，在全局特征融合过程中，变换器模块的集成可能会导致局部上下文信息的丢失。在这项工作中，我们提出了一种名为多尺度交叉感知器注意网络（MCPA）的二维医学图像分割模型。MCPA 由三个主要部分组成：编码器、解码器和交叉感知器。交叉感知器首先利用多个多尺度交叉感知器模块捕捉局部相关性，促进跨尺度特征的融合。然后，将得到的多尺度特征向量在空间上展开、连接，并通过全局感知器模块来模拟全局依赖关系。考虑到使用三维神经网络模型的计算成本较高，而且许多重要的临床数据只能在二维中获得，我们的 MCPA 专注于二维医学图像分割。此外，我们还引入了渐进式双分支结构（PDBS），以解决涉及更精细组织结构的图像语义分割问题。这种结构将 MCPA 网络训练的分割重点从大规模结构特征逐渐转移到更复杂的像素级特征。我们在多个公开的医学图像数据集上评估了我们提出的 MCPA 模型，这些数据集来自不同的任务和设备，包括开放的大规模 CT 数据集（Synapse）、MRI 数据集（ACDC），以及广泛使用的由眼底相机捕获的二维医学成像数据集（DRIVE、CHASE（\_\）DB1、HRF）和 OCTA 数据集（ROSE）。实验结果表明，我们的 MCPA 模型达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.