Boosting Convolution With Efficient MLP-Permutation for Volumetric Medical Image Segmentation

Yi Lin;Xiao Fang;Dong Zhang;Kwang-Ting Cheng;Hao Chen
{"title":"Boosting Convolution With Efficient MLP-Permutation for Volumetric Medical Image Segmentation","authors":"Yi Lin;Xiao Fang;Dong Zhang;Kwang-Ting Cheng;Hao Chen","doi":"10.1109/TMI.2025.3530113","DOIUrl":null,"url":null,"abstract":"Recently, the advent of Vision Transformer (ViT) has brought substantial advancements in 3D benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the resource-intensive self-attention module. In this work, we propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic anisotropy problem of 3D volumetric data by employing a combination of 2D and 3D CNNs to extract local features. Besides, we propose an efficient multi-layer permute perceptron (MLPP) module that captures long-range dependence while preserving positional information. This is achieved through an axis decomposition operation that permutes the input tensor along different axes, thereby enabling the separate encoding of the positional information. Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg with a token segmentation operation, which divides the feature into smaller tokens and processes them individually. Extensive experimental results validate that PHNet outperformed the state-of-the-art methods with lower computational costs on the widely-used yet challenging COVID-19-20, Synapse, LiTS and MSD BraTS benchmarks. The ablation study also demonstrated the effectiveness of PHNet in harnessing the strengths of both CNNs and MLP. The code is available on Github: <uri>https://github.com/xiaofang007/PHNet</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 5","pages":"2341-2352"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10843792/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, the advent of Vision Transformer (ViT) has brought substantial advancements in 3D benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the resource-intensive self-attention module. In this work, we propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic anisotropy problem of 3D volumetric data by employing a combination of 2D and 3D CNNs to extract local features. Besides, we propose an efficient multi-layer permute perceptron (MLPP) module that captures long-range dependence while preserving positional information. This is achieved through an axis decomposition operation that permutes the input tensor along different axes, thereby enabling the separate encoding of the positional information. Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg with a token segmentation operation, which divides the feature into smaller tokens and processes them individually. Extensive experimental results validate that PHNet outperformed the state-of-the-art methods with lower computational costs on the widely-used yet challenging COVID-19-20, Synapse, LiTS and MSD BraTS benchmarks. The ablation study also demonstrated the effectiveness of PHNet in harnessing the strengths of both CNNs and MLP. The code is available on Github: https://github.com/xiaofang007/PHNet.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于高效mlp置换增强卷积的体积医学图像分割
最近,视觉变压器(ViT)的出现带来了3D基准测试的实质性进展,特别是在3D体医学图像分割(Vol-MedSeg)方面。同时,多层感知器(MLP)网络由于其可与ViT相媲美的结果而重新受到研究人员的欢迎,尽管排除了资源密集型的自关注模块。在这项工作中,我们为Vol-MedSeg提出了一种新的permutable混合网络,名为PHNet,它利用了卷积神经网络(cnn)和MLP的优势。PHNet通过结合2D和3D cnn来提取局部特征,解决了3D体积数据的固有各向异性问题。此外,我们提出了一种高效的多层置换感知器(MLPP)模块,该模块在保留位置信息的同时捕获远程依赖关系。这是通过轴分解操作实现的,该操作沿着不同的轴排列输入张量,从而实现位置信息的单独编码。此外,MLPP通过令牌分割操作解决了Vol-MedSeg中MLP的分辨率敏感性问题,该操作将特征划分为更小的令牌并单独处理。大量的实验结果证实,PHNet在广泛使用但具有挑战性的COVID-19-20、Synapse、LiTS和MSD BraTS基准测试中,以更低的计算成本优于最先进的方法。消融研究也证明了PHNet在利用cnn和MLP的优势方面的有效性。代码可在Github上获得:https://github.com/xiaofang007/PHNet。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction. MARVEL: Motion-Aware Reconstruction Via Embedded Learning of Motion Prior for Time-Resolved Cardiac CT. QuPaS: SAM-based Semi-supervised Histopathological Image Segmentation with Quantum Force Field Finetuning and Adversarial Estimation. Dynamic Registration-Based Photoacoustic Endoscopic Temperature Imaging for Precision Interventional Thermal Therapy and Monitoring. Scan-invariant Mamba with Differentiated Sequence Contrastive Learning in Computational Pathology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1