Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network

IF 3.2 2区医学 Q2 BIOCHEMICAL RESEARCH METHODS Biomedical optics express Pub Date : 2024-03-26 DOI:10.1364/boe.510908

Feng Li, Zetao Huang, Lu Zhou, Yuyang Chen, Shiqing Tang, Pengchao Ding, Haixia Peng, and Yimin Chu

{"title":"Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network","authors":"Feng Li, Zetao Huang, Lu Zhou, Yuyang Chen, Shiqing Tang, Pengchao Ding, Haixia Peng, and Yimin Chu","doi":"10.1364/boe.510908","DOIUrl":null,"url":null,"abstract":"Automatic and precise polyp segmentation in colonoscopy images is highly valuable for diagnosis at an early stage and surgery of colorectal cancer. Nevertheless, it still posed a major challenge due to variations in the size and intricate morphological characteristics of polyps coupled with the indistinct demarcation between polyps and mucosas. To alleviate these challenges, we proposed an improved dual-aggregation polyp segmentation network, dubbed Dua-PSNet, for automatic and accurate full-size polyp prediction by combining both the transformer branch and a fully convolutional network (FCN) branch in a parallel style. Concretely, in the transformer branch, we adopted the B3 variant of pyramid vision transformer v2 (PVTv2-B3) as an image encoder for capturing multi-scale global features and modeling long-distant interdependencies between them whilst designing an innovative multi-stage feature aggregation decoder (MFAD) to highlight critical local feature details and effectively integrate them into global features. In the decoder, the adaptive feature aggregation (AFA) block was constructed for fusing high-level feature representations of different scales generated by the PVTv2-B3 encoder in a stepwise adaptive manner for refining global semantic information, while the ResidualBlock module was devised to mine detailed boundary cues disguised in low-level features. With the assistance of the selective global-to-local fusion head (SGLFH) module, the resulting boundary details were aggregated selectively with these global semantic features, strengthening these hierarchical features to cope with scale variations of polyps. The FCN branch embedded in the designed ResidualBlock module was used to encourage extraction of highly merged fine features to match the outputs of the Transformer branch into full-size segmentation maps. In this way, both branches were reciprocally influenced and complemented to enhance the discrimination capability of polyp features and enable a more accurate prediction of a full-size segmentation map. Extensive experiments on five challenging polyp segmentation benchmarks demonstrated that the proposed Dua-PSNet owned powerful learning and generalization ability and advanced the state-of-the-art segmentation performance among existing cutting-edge methods. These excellent results showed our Dua-PSNet had great potential to be a promising solution for practical polyp segmentation tasks in which wide variations of data typically occurred.","PeriodicalId":8969,"journal":{"name":"Biomedical optics express","volume":"21 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical optics express","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1364/boe.510908","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic and precise polyp segmentation in colonoscopy images is highly valuable for diagnosis at an early stage and surgery of colorectal cancer. Nevertheless, it still posed a major challenge due to variations in the size and intricate morphological characteristics of polyps coupled with the indistinct demarcation between polyps and mucosas. To alleviate these challenges, we proposed an improved dual-aggregation polyp segmentation network, dubbed Dua-PSNet, for automatic and accurate full-size polyp prediction by combining both the transformer branch and a fully convolutional network (FCN) branch in a parallel style. Concretely, in the transformer branch, we adopted the B3 variant of pyramid vision transformer v2 (PVTv2-B3) as an image encoder for capturing multi-scale global features and modeling long-distant interdependencies between them whilst designing an innovative multi-stage feature aggregation decoder (MFAD) to highlight critical local feature details and effectively integrate them into global features. In the decoder, the adaptive feature aggregation (AFA) block was constructed for fusing high-level feature representations of different scales generated by the PVTv2-B3 encoder in a stepwise adaptive manner for refining global semantic information, while the ResidualBlock module was devised to mine detailed boundary cues disguised in low-level features. With the assistance of the selective global-to-local fusion head (SGLFH) module, the resulting boundary details were aggregated selectively with these global semantic features, strengthening these hierarchical features to cope with scale variations of polyps. The FCN branch embedded in the designed ResidualBlock module was used to encourage extraction of highly merged fine features to match the outputs of the Transformer branch into full-size segmentation maps. In this way, both branches were reciprocally influenced and complemented to enhance the discrimination capability of polyp features and enable a more accurate prediction of a full-size segmentation map. Extensive experiments on five challenging polyp segmentation benchmarks demonstrated that the proposed Dua-PSNet owned powerful learning and generalization ability and advanced the state-of-the-art segmentation performance among existing cutting-edge methods. These excellent results showed our Dua-PSNet had great potential to be a promising solution for practical polyp segmentation tasks in which wide variations of data typically occurred.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

结合金字塔视觉变换器和全卷积网络的改进型双聚合息肉分割网络

结肠镜图像中的息肉自动精确分割对于早期诊断和结肠直肠癌手术非常有价值。然而，由于息肉大小不一，形态特征错综复杂，息肉与粘膜之间的分界模糊不清，因此这仍然是一项重大挑战。为了缓解这些挑战，我们提出了一种改进的双聚合息肉分割网络（Dua-PSNet），通过将变压器分支和全卷积网络（FCN）分支并行结合，自动准确地预测息肉的全尺寸。具体来说，在变换器分支中，我们采用了金字塔视觉变换器 v2 的 B3 变体（PVTv2-B3）作为图像编码器，以捕捉多尺度全局特征并模拟它们之间的远距离相互依存关系，同时设计了创新的多级特征聚合解码器（MFAD），以突出关键的局部特征细节并有效地将它们整合到全局特征中。在解码器中，构建了自适应特征聚合（AFA）模块，用于以逐步自适应的方式融合 PVTv2-B3 编码器生成的不同尺度的高级特征表征，以完善全局语义信息，同时设计了残余模块（ResidualBlock），以挖掘隐藏在低级特征中的详细边界线索。在选择性全局到局部融合头（SGLFH）模块的协助下，所得到的边界细节与这些全局语义特征进行选择性聚合，从而加强这些分层特征，以应对息肉的尺度变化。嵌入到所设计的 ResidualBlock 模块中的 FCN 分支用于鼓励提取高度合并的精细特征，以便将 Transformer 分支的输出与全尺寸分割图相匹配。这样，两个分支相互影响、相互补充，从而提高了息肉特征的辨别能力，并能更准确地预测全尺寸分割图。在五个具有挑战性的息肉分割基准上进行的广泛实验表明，所提出的 Dua-PSNet 具有强大的学习和泛化能力，在现有的前沿方法中提升了最先进的分割性能。这些出色的结果表明，Dua-PSNet 具有巨大的潜力，有望成为实际息肉分割任务的解决方案，因为在实际息肉分割任务中，通常会出现大量不同的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biomedical optics express BIOCHEMICAL RESEARCH METHODS-OPTICS

CiteScore

6.80

自引率

11.80%

发文量

633

审稿时长

1 months

期刊介绍： The journal''s scope encompasses fundamental research, technology development, biomedical studies and clinical applications. BOEx focuses on the leading edge topics in the field, including: Tissue optics and spectroscopy Novel microscopies Optical coherence tomography Diffuse and fluorescence tomography Photoacoustic and multimodal imaging Molecular imaging and therapies Nanophotonic biosensing Optical biophysics/photobiology Microfluidic optical devices Vision research.