Multi-scale dual-channel feature embedding decoder for biomedical image segmentation

IF 4.8 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer methods and programs in biomedicine Pub Date : 2024-10-18 DOI:10.1016/j.cmpb.2024.108464

Rohit Agarwal , Palash Ghosal , Anup K. Sadhu , Narayan Murmu , Debashis Nandi

{"title":"Multi-scale dual-channel feature embedding decoder for biomedical image segmentation","authors":"Rohit Agarwal , Palash Ghosal , Anup K. Sadhu , Narayan Murmu , Debashis Nandi","doi":"10.1016/j.cmpb.2024.108464","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Attaining global context along with local dependencies is of paramount importance for achieving highly accurate segmentation of objects from image frames and is challenging while developing deep learning-based biomedical image segmentation. Several transformer-based models have been proposed to handle this issue in biomedical image segmentation. Despite this, segmentation accuracy remains an ongoing challenge, as these models often fall short of the target range due to their limited capacity to capture critical local and global contexts. However, the quadratic computational complexity is the main limitation of these models. Moreover, a large dataset is required to train those models.</div></div><div><h3>Methods:</h3><div>In this paper, we propose a novel multi-scale dual-channel decoder to mitigate this issue. The complete segmentation model uses two parallel encoders and a dual-channel decoder. The encoders are based on convolutional networks, which capture the features of the input images at multiple levels and scales. The decoder comprises a hierarchy of Attention-gated Swin Transformers with a fine-tuning strategy. The hierarchical Attention-gated Swin Transformers implements a multi-scale, multi-level feature embedding strategy that captures short and long-range dependencies and leverages the necessary features without increasing computational load. At the final stage of the decoder, a fine-tuning strategy is implemented that refines the features to keep the rich features and reduce the possibility of over-segmentation.</div></div><div><h3>Results:</h3><div>The proposed model is evaluated on publicly available LiTS, 3DIRCADb, and spleen datasets obtained from Medical Segmentation Decathlon. The model is also evaluated on a private dataset from Medical College Kolkata, India. We observe that the proposed model outperforms the state-of-the-art models in liver tumor and spleen segmentation in terms of evaluation metrics at a comparative computational cost.</div></div><div><h3>Conclusion:</h3><div>The novel dual-channel decoder embeds multi-scale features and creates a representation of both short and long-range contexts efficiently. It also refines the features at the final stage to select only necessary features. As a result, we achieve better segmentation performance than the state-of-the-art models.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"257 ","pages":"Article 108464"},"PeriodicalIF":4.8000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724004577","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objective:

Attaining global context along with local dependencies is of paramount importance for achieving highly accurate segmentation of objects from image frames and is challenging while developing deep learning-based biomedical image segmentation. Several transformer-based models have been proposed to handle this issue in biomedical image segmentation. Despite this, segmentation accuracy remains an ongoing challenge, as these models often fall short of the target range due to their limited capacity to capture critical local and global contexts. However, the quadratic computational complexity is the main limitation of these models. Moreover, a large dataset is required to train those models.

Methods:

In this paper, we propose a novel multi-scale dual-channel decoder to mitigate this issue. The complete segmentation model uses two parallel encoders and a dual-channel decoder. The encoders are based on convolutional networks, which capture the features of the input images at multiple levels and scales. The decoder comprises a hierarchy of Attention-gated Swin Transformers with a fine-tuning strategy. The hierarchical Attention-gated Swin Transformers implements a multi-scale, multi-level feature embedding strategy that captures short and long-range dependencies and leverages the necessary features without increasing computational load. At the final stage of the decoder, a fine-tuning strategy is implemented that refines the features to keep the rich features and reduce the possibility of over-segmentation.

Results:

The proposed model is evaluated on publicly available LiTS, 3DIRCADb, and spleen datasets obtained from Medical Segmentation Decathlon. The model is also evaluated on a private dataset from Medical College Kolkata, India. We observe that the proposed model outperforms the state-of-the-art models in liver tumor and spleen segmentation in terms of evaluation metrics at a comparative computational cost.

Conclusion:

The novel dual-channel decoder embeds multi-scale features and creates a representation of both short and long-range contexts efficiently. It also refines the features at the final stage to select only necessary features. As a result, we achieve better segmentation performance than the state-of-the-art models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于生物医学图像分割的多尺度双通道特征嵌入解码器

背景和目的：要从图像帧中实现高精度的物体分割，获得全局上下文和局部依赖性至关重要，这对开发基于深度学习的生物医学图像分割具有挑战性。在生物医学图像分割中，已经提出了几种基于变换器的模型来处理这个问题。尽管如此，分割的准确性仍然是一个持续的挑战，因为这些模型捕捉关键的局部和全局上下文的能力有限，往往达不到目标范围。然而，二次计算复杂性是这些模型的主要局限。此外，训练这些模型还需要大量的数据集：本文提出了一种新颖的多尺度双通道解码器来缓解这一问题。完整的分割模型使用两个并行编码器和一个双信道解码器。编码器基于卷积网络，可捕捉多层次、多尺度的输入图像特征。解码器由具有微调策略的分层注意力门控斯温变换器组成。分层注意力门控斯温变换器实现了多尺度、多层次的特征嵌入策略，可捕捉短距离和长距离的依赖关系，并在不增加计算负荷的情况下利用必要的特征。在解码器的最后阶段，实施了微调策略，对特征进行细化，以保留丰富的特征并减少过度分割的可能性：结果：在公开的 LiTS、3DIRCADb 和从医学分割十项全能竞赛中获得的脾脏数据集上对所提出的模型进行了评估。该模型还在印度加尔各答医学院的私人数据集上进行了评估。我们发现，在肝脏肿瘤和脾脏分割的评估指标方面，所提出的模型在计算成本上优于最先进的模型：新颖的双通道解码器嵌入了多尺度特征，并有效地创建了短程和长程上下文的表示。它还能在最后阶段对特征进行细化，只选择必要的特征。因此，我们实现了比最先进模型更好的分割性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.