Deep learning with uncertainty estimation for automatic tumor segmentation in PET/CT of head and neck cancers: impact of model complexity, image processing and augmentation.
Bao Ngoc Huynh, Aurora Rosvoll Groendahl, Oliver Tomic, Kristian Hovde Liland, Ingerid Skjei Knudtsen, Frank Hoebers, Wouter van Elmpt, Einar Dale, Eirik Malinen, Cecilia Marie Futsaether
{"title":"Deep learning with uncertainty estimation for automatic tumor segmentation in PET/CT of head and neck cancers: impact of model complexity, image processing and augmentation.","authors":"Bao Ngoc Huynh, Aurora Rosvoll Groendahl, Oliver Tomic, Kristian Hovde Liland, Ingerid Skjei Knudtsen, Frank Hoebers, Wouter van Elmpt, Einar Dale, Eirik Malinen, Cecilia Marie Futsaether","doi":"10.1088/2057-1976/ad6dcd","DOIUrl":null,"url":null,"abstract":"<p><p><i>Objective.</i>Target volumes for radiotherapy are usually contoured manually, which can be time-consuming and prone to inter- and intra-observer variability. Automatic contouring by convolutional neural networks (CNN) can be fast and consistent but may produce unrealistic contours or miss relevant structures. We evaluate approaches for increasing the quality and assessing the uncertainty of CNN-generated contours of head and neck cancers with PET/CT as input.<i>Approach.</i>Two patient cohorts with head and neck squamous cell carcinoma and baseline<sup>18</sup>F-fluorodeoxyglucose positron emission tomography and computed tomography images (FDG-PET/CT) were collected retrospectively from two centers. The union of manual contours of the gross primary tumor and involved nodes was used to train CNN models for generating automatic contours. The impact of image preprocessing, image augmentation, transfer learning and CNN complexity, architecture, and dimension (2D or 3D) on model performance and generalizability across centers was evaluated. A Monte Carlo dropout technique was used to quantify and visualize the uncertainty of the automatic contours.<i>Main results</i>. CNN models provided contours with good overlap with the manually contoured ground truth (median Dice Similarity Coefficient: 0.75-0.77), consistent with reported inter-observer variations and previous auto-contouring studies. Image augmentation and model dimension, rather than model complexity, architecture, or advanced image preprocessing, had the largest impact on model performance and cross-center generalizability. Transfer learning on a limited number of patients from a separate center increased model generalizability without decreasing model performance on the original training cohort. High model uncertainty was associated with false positive and false negative voxels as well as low Dice coefficients.<i>Significance.</i>High quality automatic contours can be obtained using deep learning architectures that are not overly complex. Uncertainty estimation of the predicted contours shows potential for highlighting regions of the contour requiring manual revision or flagging segmentations requiring manual inspection and intervention.</p>","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/ad6dcd","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Objective.Target volumes for radiotherapy are usually contoured manually, which can be time-consuming and prone to inter- and intra-observer variability. Automatic contouring by convolutional neural networks (CNN) can be fast and consistent but may produce unrealistic contours or miss relevant structures. We evaluate approaches for increasing the quality and assessing the uncertainty of CNN-generated contours of head and neck cancers with PET/CT as input.Approach.Two patient cohorts with head and neck squamous cell carcinoma and baseline18F-fluorodeoxyglucose positron emission tomography and computed tomography images (FDG-PET/CT) were collected retrospectively from two centers. The union of manual contours of the gross primary tumor and involved nodes was used to train CNN models for generating automatic contours. The impact of image preprocessing, image augmentation, transfer learning and CNN complexity, architecture, and dimension (2D or 3D) on model performance and generalizability across centers was evaluated. A Monte Carlo dropout technique was used to quantify and visualize the uncertainty of the automatic contours.Main results. CNN models provided contours with good overlap with the manually contoured ground truth (median Dice Similarity Coefficient: 0.75-0.77), consistent with reported inter-observer variations and previous auto-contouring studies. Image augmentation and model dimension, rather than model complexity, architecture, or advanced image preprocessing, had the largest impact on model performance and cross-center generalizability. Transfer learning on a limited number of patients from a separate center increased model generalizability without decreasing model performance on the original training cohort. High model uncertainty was associated with false positive and false negative voxels as well as low Dice coefficients.Significance.High quality automatic contours can be obtained using deep learning architectures that are not overly complex. Uncertainty estimation of the predicted contours shows potential for highlighting regions of the contour requiring manual revision or flagging segmentations requiring manual inspection and intervention.
期刊介绍:
BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.