Xiao Lou, Juan Zhu, Jian Yang, Youzhe Zhu, Huazhong Shu, Baosheng Li
{"title":"Enhanced Cross-stage-attention U-Net for esophageal target volume segmentation.","authors":"Xiao Lou, Juan Zhu, Jian Yang, Youzhe Zhu, Huazhong Shu, Baosheng Li","doi":"10.1186/s12880-024-01515-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The segmentation of target volume and organs at risk (OAR) was a significant part of radiotherapy. Specifically, determining the location and scale of the esophagus in simulated computed tomography images was difficult and time-consuming primarily due to its complex structure and low contrast with the surrounding tissues. In this study, an Enhanced Cross-stage-attention U-Net was proposed to solve the segmentation problem for the esophageal gross tumor volume (GTV) and clinical tumor volume (CTV) in CT images.</p><p><strong>Methods: </strong>First, a module based on principal component analysis theory was constructed to pre-extract the features of the input image. Then, a cross-stage based feature fusion model was designed to replace the skip concatenation of original UNet, which was composed of Wide Range Attention unit, Small-kernel Local Attention unit, and Inverted Bottleneck unit. WRA was employed to capture global attention, whose large convolution kernel was further decomposed to simplify the calculation. SLA was used to complement the local attention to WRA. IBN was structed to fuse the extracted features, where a global frequency response layer was built to redistribute the frequency response of the fused feature maps.</p><p><strong>Results: </strong>The proposed method was compared with relevant published esophageal segmentation methods. The prediction of the proposed network was MSD = 2.83(1.62, 4.76)mm, HD = 11.79 ± 6.02 mm, DC = 72.45 ± 19.18% in GTV; MSD = 5.26(2.18, 8.82)mm, HD = 16.22 ± 10.01 mm, DC = 71.06 ± 17.72% in CTV.</p><p><strong>Conclusion: </strong>The reconstruction of the skip concatenation in UNet showed an improvement of performance for esophageal segmentation. The results showed the proposed network had better effect on esophageal GTV and CTV segmentation.</p>","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"24 1","pages":"339"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656919/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-024-01515-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The segmentation of target volume and organs at risk (OAR) was a significant part of radiotherapy. Specifically, determining the location and scale of the esophagus in simulated computed tomography images was difficult and time-consuming primarily due to its complex structure and low contrast with the surrounding tissues. In this study, an Enhanced Cross-stage-attention U-Net was proposed to solve the segmentation problem for the esophageal gross tumor volume (GTV) and clinical tumor volume (CTV) in CT images.
Methods: First, a module based on principal component analysis theory was constructed to pre-extract the features of the input image. Then, a cross-stage based feature fusion model was designed to replace the skip concatenation of original UNet, which was composed of Wide Range Attention unit, Small-kernel Local Attention unit, and Inverted Bottleneck unit. WRA was employed to capture global attention, whose large convolution kernel was further decomposed to simplify the calculation. SLA was used to complement the local attention to WRA. IBN was structed to fuse the extracted features, where a global frequency response layer was built to redistribute the frequency response of the fused feature maps.
Results: The proposed method was compared with relevant published esophageal segmentation methods. The prediction of the proposed network was MSD = 2.83(1.62, 4.76)mm, HD = 11.79 ± 6.02 mm, DC = 72.45 ± 19.18% in GTV; MSD = 5.26(2.18, 8.82)mm, HD = 16.22 ± 10.01 mm, DC = 71.06 ± 17.72% in CTV.
Conclusion: The reconstruction of the skip concatenation in UNet showed an improvement of performance for esophageal segmentation. The results showed the proposed network had better effect on esophageal GTV and CTV segmentation.
目的:靶体积和危险器官的分割(OAR)是放射治疗的重要组成部分。具体来说,在模拟计算机断层扫描图像中确定食管的位置和规模是困难和耗时的,主要原因是其结构复杂,与周围组织的对比度低。本研究提出了一种增强的跨阶段关注U-Net算法,用于解决CT图像中食道大体肿瘤体积(GTV)和临床肿瘤体积(CTV)的分割问题。方法:首先,构建基于主成分分析理论的模块,对输入图像进行特征预提取;然后,设计了一种基于跨阶段的特征融合模型,以取代由宽范围注意单元、小核局部注意单元和倒瓶颈单元组成的原始UNet跳跃拼接模型;利用WRA捕获全局注意力,对其大卷积核进行进一步分解,简化计算。SLA被用来补充当地对WRA的关注。构造IBN对提取的特征进行融合,构建全局频响层对融合后的特征映射进行频响重分布。结果:与已发表的相关食管分割方法进行了比较。GTV预测网络的MSD = 2.83(1.62, 4.76)mm, HD = 11.79±6.02 mm, DC = 72.45±19.18%;默沙东- = 5.26毫米(2.18,8.82),高清= 16.22±10.01毫米,在CTV DC = 71.06±17.72%。结论:UNet中跳跃连接的重建提高了食管分割的性能。结果表明,该网络对食管GTV和CTV的分割效果较好。
期刊介绍:
BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.