Background: In laparoscopic liver surgery, accurately predicting the displacement of key intrahepatic anatomical structures is crucial for informing the doctor's intraoperative decision-making. However, due to the constrained surgical perspective, only a partial surface of the liver is typically visible. Consequently, the utilization of non-rigid volume to surface registration methods becomes essential. But traditional registration methods lack the necessary accuracy and cannot meet real-time requirements.
Purpose: To achieve high-precision liver registration with only partial surface information and estimate the displacement of internal liver tissues in real-time.
Methods: We propose a novel neural network architecture tailored for real-time non-rigid liver volume to surface registration. The network utilizes a voxel-based method, integrating sparse convolution with the newly proposed points of interest (POI) linear attention module. POI linear attention module specifically calculates attention on the previously extracted POI. Additionally, we identified the most suitable normalization method RMSINorm.
Results: We evaluated our proposed network and other networks on a dataset generated from real liver models and two real datasets. Our method achieves an average error of 4.23 mm and a mean frame rate of 65.4 fps in the generation dataset. It also achieves an average error of 8.29 mm in the human breathing motion dataset.
Conclusions: Our network outperforms CNN-based networks and other attention networks in terms of accuracy and inference speed.
Background: Although four-dimensional cone-beam computed tomography (4D-CBCT) is valuable to provide onboard image guidance for radiotherapy of moving targets, it requires a long acquisition time to achieve sufficient image quality for target localization. To improve the utility, it is highly desirable to reduce the 4D-CBCT scanning time while maintaining high-quality images. Current motion-compensated methods are limited by slow speed and compensation errors due to the severe intraphase undersampling.
Purpose: In this work, we aim to propose an alternative feature-compensated method to realize the fast 4D-CBCT with high-quality images.
Methods: We proposed a feature-compensated deformable convolutional network (FeaCo-DCN) to perform interphase compensation in the latent feature space, which has not been explored by previous studies. In FeaCo-DCN, encoding networks extract features from each phase, and then, features of other phases are deformed to those of the target phase via deformable convolutional networks. Finally, a decoding network combines and decodes features from all phases to yield high-quality images of the target phase. The proposed FeaCo-DCN was evaluated using lung cancer patient data.
Results: (1) FeaCo-DCN generated high-quality images with accurate and clear structures for a fast 4D-CBCT scan; (2) 4D-CBCT images reconstructed by FeaCo-DCN achieved 3D tumor localization accuracy within 2.5 mm; (3) image reconstruction is nearly real time; and (4) FeaCo-DCN achieved superior performance by all metrics compared to the top-ranked techniques in the AAPM SPARE Challenge.
Conclusion: The proposed FeaCo-DCN is effective and efficient in reconstructing 4D-CBCT while reducing about 90% of the scanning time, which can be highly valuable for moving target localization in image-guided radiotherapy.
Purpose: To develop a head and neck normal structures autocontouring tool that could be used to automatically detect the errors in autocontours from a clinically validated autocontouring tool.
Methods: An autocontouring tool based on convolutional neural networks (CNN) was developed for 16 normal structures of the head and neck and tested to identify the contour errors from a clinically validated multiatlas-based autocontouring system (MACS). The computed tomography (CT) scans and clinical contours from 3495 patients were semiautomatically curated and used to train and validate the CNN-based autocontouring tool. The final accuracy of the tool was evaluated by calculating the Sørensen-Dice similarity coefficients (DSC) and Hausdorff distances between the automatically generated contours and physician-drawn contours on 174 internal and 24 external CT scans. Lastly, the CNN-based tool was evaluated on 60 patients' CT scans to investigate the possibility to detect contouring failures. The contouring failures on these patients were classified as either minor or major errors. The criteria to detect contouring errors were determined by analyzing the DSC between the CNN- and MACS-based contours under two independent scenarios: (a) contours with minor errors are clinically acceptable and (b) contours with minor errors are clinically unacceptable.
Results: The average DSC and Hausdorff distance of our CNN-based tool was 98.4%/1.23 cm for brain, 89.1%/0.42 cm for eyes, 86.8%/1.28 cm for mandible, 86.4%/0.88 cm for brainstem, 83.4%/0.71 cm for spinal cord, 82.7%/1.37 cm for parotids, 80.7%/1.08 cm for esophagus, 71.7%/0.39 cm for lenses, 68.6%/0.72 for optic nerves, 66.4%/0.46 cm for cochleas, and 40.7%/0.96 cm for optic chiasm. With the error detection tool, the proportions of the clinically unacceptable MACS contours that were correctly detected were 0.99/0.80 on average except for the optic chiasm, when contours with minor errors are clinically acceptable/unacceptable, respectively. The proportions of the clinically acceptable MACS contours that were correctly detected were 0.81/0.60 on average except for the optic chiasm, when contours with minor errors are clinically acceptable/unacceptable, respectively.
Conclusion: Our CNN-based autocontouring tool performed well on both the publically available and the internal datasets. Furthermore, our results show that CNN-based algorithms are able to identify ill-defined contours from a clinically validated and used multiatlas-based autocontouring tool. Therefore, our CNN-based tool can effectively perform automatic verification of MACS contours.
Purpose: Numerous image reconstruction methodologies for positron emission tomography (PET) have been developed that incorporate magnetic resonance (MR) imaging structural information, producing reconstructed images with improved suppression of noise and reduced partial volume effects. However, the influence of MR structural information also increases the possibility of suppression or bias of structures present only in the PET data (PET-unique regions). To address this, further developments for MR-informed methods have been proposed, for example, through inclusion of the current reconstructed PET image, alongside the MR image, in the iterative reconstruction process. In this present work, a number of kernel and maximum a posteriori (MAP) methodologies are compared, with the aim of identifying methods that enable a favorable trade-off between the suppression of noise and the retention of unique features present in the PET data.
Methods: The reconstruction methods investigated were: the MR-informed conventional and spatially compact kernel methods, referred to as KEM and KEM largest value sparsification (LVS) respectively; the MR-informed Bowsher and Gaussian MR-guided MAP methods; and the PET-MR-informed hybrid kernel and anato-functional MAP methods. The trade-off between improving the reconstruction of the whole brain region and the PET-unique regions was investigated for all methods in comparison with postsmoothed maximum likelihood expectation maximization (MLEM), evaluated in terms of structural similarity index (SSIM), normalized root mean square error (NRMSE), bias, and standard deviation. Both simulated BrainWeb (10 noise realizations) and real [18 F] fluorodeoxyglucose (FDG) three-dimensional datasets were used. The real [18 F]FDG dataset was augmented with simulated tumors to allow comparison of the reconstruction methodologies for the case of known regions of PET-MR discrepancy and evaluated at full counts (100%) and at a reduced (10%) count level.
Results: For the high-count simulated and real data studies, the anato-functional MAP method performed better than the other methods under investigation (MR-informed, PET-MR-informed and postsmoothed MLEM), in terms of achieving the best trade-off for the reconstruction of the whole brain and PET-unique regions, assessed in terms of the SSIM, NRMSE, and bias vs standard deviation. The inclusion of PET information in the anato-functional MAP method enables the reconstruction of PET-unique regions to attain similarly low levels of bias as unsmoothed MLEM, while moderately improving the whole brain image quality for low levels of regularization. However, for low count simulated datasets the anato-functional MAP method performs poorly, due to the inclusion of noisy PET information in the regularization term. For the low counts simulated dataset, KEM LVS and to a lesser extent, HKEM performed better than the ot
Purpose: In order to attain anatomical models, surgical guides and implants for computer-assisted surgery, accurate segmentation of bony structures in cone-beam computed tomography (CBCT) scans is required. However, this image segmentation step is often impeded by metal artifacts. Therefore, this study aimed to develop a mixed-scale dense convolutional neural network (MS-D network) for bone segmentation in CBCT scans affected by metal artifacts.
Method: Training data were acquired from 20 dental CBCT scans affected by metal artifacts. An experienced medical engineer segmented the bony structures in all CBCT scans using global thresholding and manually removed all remaining noise and metal artifacts. The resulting gold standard segmentations were used to train an MS-D network comprising 100 convolutional layers using far fewer trainable parameters than alternative convolutional neural network (CNN) architectures. The bone segmentation performance of the MS-D network was evaluated using a leave-2-out scheme and compared with a clinical snake evolution algorithm and two state-of-the-art CNN architectures (U-Net and ResNet). All segmented CBCT scans were subsequently converted into standard tessellation language (STL) models and geometrically compared with the gold standard.
Results: CBCT scans segmented using the MS-D network, U-Net, ResNet and the snake evolution algorithm demonstrated mean Dice similarity coefficients of 0.87 ± 0.06, 0.87 ± 0.07, 0.86 ± 0.05, and 0.78 ± 0.07, respectively. The STL models acquired using the MS-D network, U-Net, ResNet and the snake evolution algorithm demonstrated mean absolute deviations of 0.44 mm ± 0.13 mm, 0.43 mm ± 0.16 mm, 0.40 mm ± 0.12 mm and 0.57 mm ± 0.22 mm, respectively. In contrast to the MS-D network, the ResNet introduced wave-like artifacts in the STL models, whereas the U-Net incorrectly labeled background voxels as bone around the vertebrae in 4 of the 9 CBCT scans containing vertebrae.
Conclusion: The MS-D network was able to accurately segment bony structures in CBCT scans affected by metal artifacts.
Purpose: Recently, several attempts were conducted to transfer deep learning to medical image reconstruction. An increasingly number of publications follow the concept of embedding the computed tomography (CT) reconstruction as a known operator into a neural network. However, most of the approaches presented lack an efficient CT reconstruction framework fully integrated into deep learning environments. As a result, many approaches use workarounds for mathematically unambiguously solvable problems.
Methods: PYRO-NN is a generalized framework to embed known operators into the prevalent deep learning framework Tensorflow. The current status includes state-of-the-art parallel-, fan-, and cone-beam projectors, and back-projectors accelerated with CUDA provided as Tensorflow layers. On top, the framework provides a high-level Python API to conduct FBP and iterative reconstruction experiments with data from real CT systems.
Results: The framework provides all necessary algorithms and tools to design end-to-end neural network pipelines with integrated CT reconstruction algorithms. The high-level Python API allows a simple use of the layers as known from Tensorflow. All algorithms and tools are referenced to a scientific publication and are compared to existing non-deep learning reconstruction frameworks. To demonstrate the capabilities of the layers, the framework comes with baseline experiments, which are described in the supplementary material. The framework is available as open-source software under the Apache 2.0 licence at https://github.com/csyben/PYRO-NN.
Conclusions: PYRO-NN comes with the prevalent deep learning framework Tensorflow and allows to setup end-to-end trainable neural networks in the medical image reconstruction context. We believe that the framework will be a step toward reproducible research and give the medical physics community a toolkit to elevate medical image reconstruction with new deep learning techniques.
Purpose: This study demonstrated a magnetic resonance (MR) signal multitask learning method for three-dimensional (3D) simultaneous segmentation and relaxometry of human brain tissues.
Materials and methods: A 3D inversion-prepared balanced steady-state free precession sequence was used for acquiring in vivo multicontrast brain images. The deep neural network contained three residual blocks, and each block had 8 fully connected layers with sigmoid activation, layer norm, and 256 neurons in each layer. Online-synthesized MR signal evolutions and labels were used to train the neural network batch-by-batch. Empirically defined ranges of T1 and T2 values for the normal gray matter, white matter, and cerebrospinal fluid (CSF) were used as the prior knowledge. MRI brain experiments were performed on three healthy volunteers. The mean and standard deviation for the T1 and T2 values in vivo were reported and compared to literature values. Additional animal (N = 6) and prostate patient (N = 1) experiments were performed to compare the estimated T1 and T2 values with those from gold standard methods and to demonstrate clinical applications of the proposed method.
Results: In animal validation experiment, the differences/errors (mean difference ± standard deviation of difference) between the T1 and T2 values estimated from the proposed method and the ground truth were 113 ± 486 and 154 ± 512 ms for T1, and 5 ± 33 and 7 ± 41 ms for T2, respectively. In healthy volunteer experiments (N = 3), whole brain segmentation and relaxometry were finished within ~ 5 s. The estimated apparent T1 and T2 maps were in accordance with known brain anatomy, and not affected by coil sensitivity variation. Gray matter, white matter, and CSF were successfully segmented. The deep neural network can also generate synthetic T1- and T2-weighted images.
Conclusion: The proposed multitask learning method can directly generate brain apparent T1 and T2 maps, as well as synthetic T1- and T2-weighted images, in conjunction with segmentation of gray matter, white matter, and CSF.

