HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron

The Visual Computer Pub Date : 2024-09-06 DOI:10.1007/s00371-024-03612-y

Yazhuo Fan, Jianhua Song, Lei Yuan, Yunlin Jia

{"title":"HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron","authors":"Yazhuo Fan, Jianhua Song, Lei Yuan, Yunlin Jia","doi":"10.1007/s00371-024-03612-y","DOIUrl":null,"url":null,"abstract":"<p>In recent years, for the purpose of integrating the individual strengths of convolutional neural networks (CNN) and Transformer, a network structure has been built to integrate the two methods in medical image segmentation. But most of the methods only integrate CNN and Transformer at a single level and cannot extract low-level detail features and high-level abstract information simultaneously. Meanwhile, this structure lacks flexibility, unable to dynamically adjust the contributions of different feature maps. To address these limitations, we introduce HCT-Unet, a hybrid CNN-Transformer model specifically designed for multi-organ medical images segmentation. HCT-Unet introduces a tunable hybrid paradigm that differs significantly from conventional hybrid architectures. It deploys powerful CNN to capture short-range information and Transformer to extract long-range information at each stage. Furthermore, we have designed a multi-functional multi-scale fusion bridge, which progressively integrates information from different scales and dynamically modifies attention weights for both local and global features. With the benefits of these two innovative designs, HCT-Unet demonstrates robust discriminative dependency and representation capabilities in multi-target medical image tasks. Experimental results reveal the remarkable performance of our approach in medical image segmentation tasks. Specifically, in multi-organ segmentation tasks, HCT-Unet achieved a Dice similarity coefficient (DSC) of 82.23%. Furthermore, in cardiac segmentation tasks, it reached a DSC of 91%, significantly outperforming previous state-of-the-art networks. The code has been released on Zenodo: https://zenodo.org/doi/10.5281/zenodo.11070837.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03612-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, for the purpose of integrating the individual strengths of convolutional neural networks (CNN) and Transformer, a network structure has been built to integrate the two methods in medical image segmentation. But most of the methods only integrate CNN and Transformer at a single level and cannot extract low-level detail features and high-level abstract information simultaneously. Meanwhile, this structure lacks flexibility, unable to dynamically adjust the contributions of different feature maps. To address these limitations, we introduce HCT-Unet, a hybrid CNN-Transformer model specifically designed for multi-organ medical images segmentation. HCT-Unet introduces a tunable hybrid paradigm that differs significantly from conventional hybrid architectures. It deploys powerful CNN to capture short-range information and Transformer to extract long-range information at each stage. Furthermore, we have designed a multi-functional multi-scale fusion bridge, which progressively integrates information from different scales and dynamically modifies attention weights for both local and global features. With the benefits of these two innovative designs, HCT-Unet demonstrates robust discriminative dependency and representation capabilities in multi-target medical image tasks. Experimental results reveal the remarkable performance of our approach in medical image segmentation tasks. Specifically, in multi-organ segmentation tasks, HCT-Unet achieved a Dice similarity coefficient (DSC) of 82.23%. Furthermore, in cardiac segmentation tasks, it reached a DSC of 91%, significantly outperforming previous state-of-the-art networks. The code has been released on Zenodo: https://zenodo.org/doi/10.5281/zenodo.11070837.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HCT-Unet：通过包含多轴门控多层感知器的混合 CNN 变换器 Unet 进行多目标医学图像分割

近年来，为了整合卷积神经网络（CNN）和变换器（Transformer）的各自优势，人们建立了一种网络结构，将这两种方法整合到医学图像分割中。但大多数方法只是在单一层面上整合了 CNN 和 Transformer，无法同时提取低层次细节特征和高层次抽象信息。同时，这种结构缺乏灵活性，无法动态调整不同特征图的贡献。为了解决这些局限性，我们引入了 HCT-Unet，这是一种专为多器官医学图像分割设计的混合 CNN-Transformer 模型。HCT-Unet 引入了一种可调整的混合模式，与传统的混合架构有很大不同。它部署了功能强大的 CNN 来捕捉短程信息，并在每个阶段部署 Transformer 来提取远程信息。此外，我们还设计了一个多功能多尺度融合桥，它能逐步整合来自不同尺度的信息，并动态修改局部和全局特征的注意力权重。借助这两项创新设计的优势，HCT-Unet 在多目标医学图像任务中表现出了强大的判别依赖性和表示能力。实验结果表明，我们的方法在医学图像分割任务中表现出色。具体来说，在多器官分割任务中，HCT-Unet 的戴斯相似系数（DSC）达到了 82.23%。此外，在心脏分割任务中，HCT-Unet 的 DSC 高达 91%，明显优于之前最先进的网络。代码已在 Zenodo 上发布：https://zenodo.org/doi/10.5281/zenodo.11070837。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量