HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-09-07 DOI:10.1016/j.cviu.2024.104151
Junding Sun , Yabei Li , Xiaosheng Wu , Chaosheng Tang , Shuihua Wang , Yudong Zhang
{"title":"HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation","authors":"Junding Sun ,&nbsp;Yabei Li ,&nbsp;Xiaosheng Wu ,&nbsp;Chaosheng Tang ,&nbsp;Shuihua Wang ,&nbsp;Yudong Zhang","doi":"10.1016/j.cviu.2024.104151","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives:</h3><p>Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: <strong>(i)</strong> the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. <strong>(ii)</strong> Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And <strong>(iii)</strong> the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.</p></div><div><h3>Methods:</h3><p>To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.</p></div><div><h3>Results:</h3><p>Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.</p></div><div><h3>Conclusions:</h3><p>The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104151"},"PeriodicalIF":4.3000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002327/pdfft?md5=8776295cbe51596acb5f3c2feb76b9bf&pid=1-s2.0-S1077314224002327-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002327","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives:

Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: (i) the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. (ii) Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And (iii) the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.

Methods:

To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.

Results:

Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.

Conclusions:

The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HAD-Net:基于注意力 U 的网络,采用超尺度移动聚合和最大对角线采样,用于医学图像分割
目标:准确提取具有不同形状和尺度的感兴趣区(ROI)是医学图像分割的主要挑战之一。目前基于 U 的网络大多将多级编码输出汇总为改进的多尺度跳转连接。虽然这种设计已被证明能提供尺度多样性和上下文完整性,但仍存在一些直观限制:(i) 编码输出被简单地重新采样到相同大小,从而破坏了细粒度信息。利用多尺度的优势并不充分。(ii) 某些与特征维度大小成正比的冗余信息被引入,造成多级干扰。(iii) 信息传递的精确度依赖于上采样层和下采样层,但它们之间缺乏保持特征位置和趋势一致性的指导。方法:为了改善这些情况,本文提出了一种基于 U 的 CNN 网络,命名为 HAD-Net,它集合了一种新的超大规模移位聚合模块(HSAM)范式和用于跳过连接的渐进重用注意力(PRA),并采用了一对新颖的双分支无参数采样层,即最大对角线池化(MDP)和最大对角线非池化(MDUP)。也就是说,该汇集方案在较浅的阶段额外合并了五个具有一定偏移的子区域。由于子区域的缩放比例较低,可以丰富尺度和细粒度背景。然后,注意力方案包含部分到全局通道注意力(PGCA)和多尺度重用空间注意力(MRSA),它在内部建立重用连接,并将重点调整到更有用的维度上。结果:与最先进的网络相比,HAD-Net 的性能相当甚至更好,其 Dice 分别为 90.结论:事实证明,HSAM+PRA+MDP+MDUP 方案具有显著的改进效果,并留有进一步研究的空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
期刊最新文献
Scene-cGAN: A GAN for underwater restoration and scene depth estimation 2S-SGCN: A two-stage stratified graph convolutional network model for facial landmark detection on 3D data Dual stage semantic information based generative adversarial network for image super-resolution Enhancing scene text detectors with realistic text image synthesis using diffusion models Unsupervised co-generation of foreground–background segmentation from Text-to-Image synthesis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1