Background and objective
Transcranial sonography-based grading of Parkinson's Disease has gained increasing attention in recent years, and it is currently used for assistive differential diagnosis in some specialized centers. To this end, accurate midbrain segmentation is considered an important initial step. However, current practice is manual, time-consuming, and bias-prone due to the subjective nature. Relevant studies in the literature are scarce and lacks comprehensive model evaluations from application perspectives. Herein, we aimed to benchmark the best-performing U-Net model for objective, stable and robust midbrain auto-segmentation using transcranial sonography images.
Methods
A total of 584 patients who were suspected of Parkinson's Disease were retrospectively enrolled from Beijing Tiantan Hospital. The dataset was divided into training (n = 416), validation (n = 104), and testing (n = 64) sets. Three state-of-the-art deep-learning networks (U-Net, U-Net+++, and nnU-Net) were utilized to develop segmentation models, under 5-fold cross-validation and three randomization seeds for safeguarding model validity and stability. Model evaluation was conducted in testing set in three key aspects: (i) segmentation agreement using DICE coefficients (DICE), Intersection over Union (IoU), and Hausdorff Distance (HD); (ii) model stability using standard deviations of segmentation agreement metrics; (iii) prediction time efficiency, and (iv) model robustness against various degrees of ultrasound imaging noise produced by the salt-and-pepper noise and Gaussian noise.
Results
The nnU-Net achieved the best segmentation agreement (averaged DICE: 0.910, IoU: 0.836, HD: 2.793-mm) and time efficiency (1.456-s). Under mild noise corruption, the nnU-Net outperformed others with averaged scores of DICE (0.904), IoU (0.827), HD (2.941 mm) in the salt-and-pepper noise (signal-to-noise ratio, SNR = 0.95), and DICE (0.906), IoU (0.830), HD (2.967 mm) in the Gaussian noise (sigma value, σ = 0.1); by contrast, intriguingly, performance of the U-Net and U-Net+++ models were remarkably degraded. Under increasing levels of simulated noise corruption (SNR decreased from 0.95 to 0.75; σ increased from 0.1 to 0.5), the nnU-Net network exhibited marginal decline in segmentation agreement meanwhile yielding decent performance as if there were absence of noise corruption.
Conclusions
The nnU-Net model was the best-performing midbrain segmentation model in terms of segmentation agreement, stability, time efficiency and robustness, providing the community with an objective, effective and automated alternative. Moving forward, a multi-center multi-vendor study is warranted when it comes to clinical implementation.