Head and Neck Cancer Segmentation in FDG PET Images: Performance Comparison of Convolutional Neural Networks and Vision Transformers.

IF 2.2 4区医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Tomography Pub Date : 2023-10-18 DOI:10.3390/tomography9050151

Xiaofan Xiong, Brian J Smith, Stephen A Graves, Michael M Graham, John M Buatti, Reinhard R Beichel

{"title":"Head and Neck Cancer Segmentation in FDG PET Images: Performance Comparison of Convolutional Neural Networks and Vision Transformers.","authors":"Xiaofan Xiong, Brian J Smith, Stephen A Graves, Michael M Graham, John M Buatti, Reinhard R Beichel","doi":"10.3390/tomography9050151","DOIUrl":null,"url":null,"abstract":"<p><p>Convolutional neural networks (CNNs) have a proven track record in medical image segmentation. Recently, Vision Transformers were introduced and are gaining popularity for many computer vision applications, including object detection, classification, and segmentation. Machine learning algorithms such as CNNs or Transformers are subject to an inductive bias, which can have a significant impact on the performance of machine learning models. This is especially relevant for medical image segmentation applications where limited training data are available, and a model's inductive bias should help it to generalize well. In this work, we quantitatively assess the performance of two CNN-based networks (U-Net and U-Net-CBAM) and three popular Transformer-based segmentation network architectures (UNETR, TransBTS, and VT-UNet) in the context of HNC lesion segmentation in volumetric [F-18] fluorodeoxyglucose (FDG) PET scans. For performance assessment, 272 FDG PET-CT scans of a clinical trial (ACRIN 6685) were utilized, which includes a total of 650 lesions (primary: 272 and secondary: 378). The image data used are highly diverse and representative for clinical use. For performance analysis, several error metrics were utilized. The achieved Dice coefficient ranged from 0.833 to 0.809 with the best performance being achieved by CNN-based approaches. U-Net-CBAM, which utilizes spatial and channel attention, showed several advantages for smaller lesions compared to the standard U-Net. Furthermore, our results provide some insight regarding the image features relevant for this specific segmentation application. In addition, results highlight the need to utilize primary as well as secondary lesions to derive clinically relevant segmentation performance estimates avoiding biases.</p>","PeriodicalId":51330,"journal":{"name":"Tomography","volume":"9 5","pages":"1933-1948"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10611182/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tomography","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/tomography9050151","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) have a proven track record in medical image segmentation. Recently, Vision Transformers were introduced and are gaining popularity for many computer vision applications, including object detection, classification, and segmentation. Machine learning algorithms such as CNNs or Transformers are subject to an inductive bias, which can have a significant impact on the performance of machine learning models. This is especially relevant for medical image segmentation applications where limited training data are available, and a model's inductive bias should help it to generalize well. In this work, we quantitatively assess the performance of two CNN-based networks (U-Net and U-Net-CBAM) and three popular Transformer-based segmentation network architectures (UNETR, TransBTS, and VT-UNet) in the context of HNC lesion segmentation in volumetric [F-18] fluorodeoxyglucose (FDG) PET scans. For performance assessment, 272 FDG PET-CT scans of a clinical trial (ACRIN 6685) were utilized, which includes a total of 650 lesions (primary: 272 and secondary: 378). The image data used are highly diverse and representative for clinical use. For performance analysis, several error metrics were utilized. The achieved Dice coefficient ranged from 0.833 to 0.809 with the best performance being achieved by CNN-based approaches. U-Net-CBAM, which utilizes spatial and channel attention, showed several advantages for smaller lesions compared to the standard U-Net. Furthermore, our results provide some insight regarding the image features relevant for this specific segmentation application. In addition, results highlight the need to utilize primary as well as secondary lesions to derive clinically relevant segmentation performance estimates avoiding biases.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FDG PET图像中癌症的头部和颈部分割：卷积神经网络和视觉变换器的性能比较。

卷积神经网络在医学图像分割方面有着良好的记录。最近，视觉转换器被引入，并在许多计算机视觉应用中越来越受欢迎，包括对象检测、分类和分割。机器学习算法（如CNNs或Transformers）会受到归纳偏差的影响，这会对机器学习模型的性能产生重大影响。这对于可用的训练数据有限的医学图像分割应用尤其重要，并且模型的归纳偏差应该有助于它很好地推广。在这项工作中，我们定量评估了两种基于CNN的网络（U-Net和U-Net-CBAM）和三种流行的基于Transformer的分割网络架构（UNETR、TransBTS和VT-UNet）在体积[F-18]氟脱氧葡萄糖（FDG）PET扫描中HNC病变分割的情况下的性能。为了进行性能评估，使用了临床试验（ACRIN 6685）的272次FDG PET-CT扫描，其中包括总共650个病变（原发性：272个，继发性：378个）。所使用的图像数据是高度多样化的，并且对于临床使用具有代表性。对于性能分析，使用了几个错误度量。实现的Dice系数在0.833到0.809之间，其中基于CNN的方法实现了最佳性能。U-Net-CBAM利用空间和通道注意力，与标准U-Net相比，在较小的病变方面显示出一些优势。此外，我们的结果提供了一些关于与该特定分割应用相关的图像特征的见解。此外，研究结果强调了利用原发性和继发性病变来推导临床相关分割性能估计的必要性，以避免偏差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Tomography Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

2.70

自引率

10.50%

发文量

222

期刊介绍： TomographyTM publishes basic (technical and pre-clinical) and clinical scientific articles which involve the advancement of imaging technologies. Tomography encompasses studies that use single or multiple imaging modalities including for example CT, US, PET, SPECT, MR and hyperpolarization technologies, as well as optical modalities (i.e. bioluminescence, photoacoustic, endomicroscopy, fiber optic imaging and optical computed tomography) in basic sciences, engineering, preclinical and clinical medicine. Tomography also welcomes studies involving exploration and refinement of contrast mechanisms and image-derived metrics within and across modalities toward the development of novel imaging probes for image-based feedback and intervention. The use of imaging in biology and medicine provides unparalleled opportunities to noninvasively interrogate tissues to obtain real-time dynamic and quantitative information required for diagnosis and response to interventions and to follow evolving pathological conditions. As multi-modal studies and the complexities of imaging technologies themselves are ever increasing to provide advanced information to scientists and clinicians. Tomography provides a unique publication venue allowing investigators the opportunity to more precisely communicate integrated findings related to the diverse and heterogeneous features associated with underlying anatomical, physiological, functional, metabolic and molecular genetic activities of normal and diseased tissue. Thus Tomography publishes peer-reviewed articles which involve the broad use of imaging of any tissue and disease type including both preclinical and clinical investigations. In addition, hardware/software along with chemical and molecular probe advances are welcome as they are deemed to significantly contribute towards the long-term goal of improving the overall impact of imaging on scientific and clinical discovery.