Pub Date : 2025-12-15DOI: 10.1007/s11548-025-03556-1
Manuel Maria Loureiro da Rocha, Lisette van der Molen, Marise Neijman, Marteen J A van Alphen, Michiel M W M van den Brekel, Françoise J Siepel
Purpose: Oropharyngeal dysphagia affects up to half of head and neck cancer (HNC) patients. Multi-swallow video-fluoroscopic swallow studies (VFSS) combined with high-resolution impedance manometry (HRIM) offer a comprehensive assessment of swallowing function. However, their use in HNC populations is limited by high clinical workload and complexity of data collection and analysis with existing software.
Methods: To address the data collection challenge, we propose a framework for automatic swallow detection in simultaneous VFSS-HRIM examinations. The framework identifies candidate swallow intervals in continuous VFSS videos using an optimized double-sweep optical flow algorithm. Each candidate interval is then classified using a pressure-based swallow template derived from three annotated samples, leveraging features such as normalized peak-to-peak amplitude, mean, and standard deviation from upper esophageal sphincter sensors.
Results: The methodology was evaluated on 97 swallows from twelve post-head and neck cancer patients. The detection pipeline achieved 95% Recall and 92% F1-score. Importantly, the number of required HRIM annotations was reduced by 63%, substantially decreasing clinician workload while maintaining high accuracy.
Conclusion: This framework overcomes limitations of current software for simultaneous VFSS-HRIM collection by enabling high-accuracy, low-input swallow detection in HNC patients. Validated on a heterogeneous patient cohort, it initiates the groundwork for scalable, objective, and multimodal swallowing assessment.
{"title":"Multimodal framework for swallow detection in video-fluoroscopic swallow studies using manometric pressure distributions from dysphagic patients.","authors":"Manuel Maria Loureiro da Rocha, Lisette van der Molen, Marise Neijman, Marteen J A van Alphen, Michiel M W M van den Brekel, Françoise J Siepel","doi":"10.1007/s11548-025-03556-1","DOIUrl":"https://doi.org/10.1007/s11548-025-03556-1","url":null,"abstract":"<p><strong>Purpose: </strong>Oropharyngeal dysphagia affects up to half of head and neck cancer (HNC) patients. Multi-swallow video-fluoroscopic swallow studies (VFSS) combined with high-resolution impedance manometry (HRIM) offer a comprehensive assessment of swallowing function. However, their use in HNC populations is limited by high clinical workload and complexity of data collection and analysis with existing software.</p><p><strong>Methods: </strong>To address the data collection challenge, we propose a framework for automatic swallow detection in simultaneous VFSS-HRIM examinations. The framework identifies candidate swallow intervals in continuous VFSS videos using an optimized double-sweep optical flow algorithm. Each candidate interval is then classified using a pressure-based swallow template derived from three annotated samples, leveraging features such as normalized peak-to-peak amplitude, mean, and standard deviation from upper esophageal sphincter sensors.</p><p><strong>Results: </strong>The methodology was evaluated on 97 swallows from twelve post-head and neck cancer patients. The detection pipeline achieved 95% Recall and 92% F1-score. Importantly, the number of required HRIM annotations was reduced by 63%, substantially decreasing clinician workload while maintaining high accuracy.</p><p><strong>Conclusion: </strong>This framework overcomes limitations of current software for simultaneous VFSS-HRIM collection by enabling high-accuracy, low-input swallow detection in HNC patients. Validated on a heterogeneous patient cohort, it initiates the groundwork for scalable, objective, and multimodal swallowing assessment.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1007/s11548-025-03559-y
Falko Heitzer, Duc Duy Pham, Wojciech Kowalczyk, Marcus Jäger, Josef Pauli
Purpose: Domain generalization plays a crucial role in analyzing medical images from diverse clinics, scanner vendors, and imaging modalities. Existing methods often require substantial computational resources to train a highly generalized segmentation network, presenting challenges in terms of both availability and cost. The goal of this work is to evaluate a novel, yet simple and effective method for enhancing the generalization of deep learning models in segmentation across varying modalities.
Methods: Eight augmentation methods will be applied individually to a source domain dataset in order to generalize deep learning models. These models will then be tested on completely unseen target domain datasets from a different imaging modality and compared against a lower baseline model. By leveraging standard augmentation techniques, extensive intensity augmentations, and carefully chosen color transformations, we aim to address the domain shift problem, particularly in the cross-modality setting.
Results: Our novel CmapAug method, when combined with standard augmentation techniques, resulted in a substantial improvement in the Dice Score, outperforming the baseline. While the baseline struggled to segment the liver structure in some test cases, our selective combination of augmentation methods achieved Dice scores as high as 83.2%.
Conclusion: Our results highlight the general effectiveness of the tested augmentation methods in addressing domain generalization and mitigating the domain shift problem caused by differences in imaging modalities between the source and target domains. The proposed augmentation strategy offers a simple yet powerful solution to this challenge, with significant potential in clinical scenarios where annotated data from the target domain are limited or unavailable.
{"title":"Colormap augmentation: a novel method for cross-modality domain generalization.","authors":"Falko Heitzer, Duc Duy Pham, Wojciech Kowalczyk, Marcus Jäger, Josef Pauli","doi":"10.1007/s11548-025-03559-y","DOIUrl":"https://doi.org/10.1007/s11548-025-03559-y","url":null,"abstract":"<p><strong>Purpose: </strong>Domain generalization plays a crucial role in analyzing medical images from diverse clinics, scanner vendors, and imaging modalities. Existing methods often require substantial computational resources to train a highly generalized segmentation network, presenting challenges in terms of both availability and cost. The goal of this work is to evaluate a novel, yet simple and effective method for enhancing the generalization of deep learning models in segmentation across varying modalities.</p><p><strong>Methods: </strong>Eight augmentation methods will be applied individually to a source domain dataset in order to generalize deep learning models. These models will then be tested on completely unseen target domain datasets from a different imaging modality and compared against a lower baseline model. By leveraging standard augmentation techniques, extensive intensity augmentations, and carefully chosen color transformations, we aim to address the domain shift problem, particularly in the cross-modality setting.</p><p><strong>Results: </strong>Our novel CmapAug method, when combined with standard augmentation techniques, resulted in a substantial improvement in the Dice Score, outperforming the baseline. While the baseline struggled to segment the liver structure in some test cases, our selective combination of augmentation methods achieved Dice scores as high as 83.2%.</p><p><strong>Conclusion: </strong>Our results highlight the general effectiveness of the tested augmentation methods in addressing domain generalization and mitigating the domain shift problem caused by differences in imaging modalities between the source and target domains. The proposed augmentation strategy offers a simple yet powerful solution to this challenge, with significant potential in clinical scenarios where annotated data from the target domain are limited or unavailable.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1007/s11548-025-03558-z
Florian Heemeyer, Leonardo E Guido Lopez, Miguel E Jáuregui Abularach, Beatriz Sanz Verdejo, Quentin Boehler, Oliver Brinkmann, José L Merino, Bradley J Nelson
Purpose: Robotic systems for catheter ablation have been in clinical use for many years. While their impact on the clinical outcome and procedure times is well studied, aspects like usability and operator workload have received limited attention in the literature. Reduced workload and stress levels benefit the operator's mental and physical health, and can also lower the risk of errors and ultimately improve patient safety. The aim of this study is to investigate the workload and usability of remote magnetic navigation compared to conventional manual navigation.
Methods: We performed a user study with eight electrophysiologists. Each participant performed identical in-vitro navigation tasks replicating those found in pulmonary vein isolation using both manual and magnetic navigation. Magnetic navigation experiments were performed using the Navion, a mobile electromagnetic navigation system.
Results: Magnetic navigation significantly improved usability (p < 0.02) and workload (p < 0.01) compared to manual navigation, measured using the System Usability Scale (magnetic: 85.6 ± 9.3 vs. manual: 75.0 ± 17.8) and NASA Task Load Index (magnetic: 72.4 ± 13.5 vs. manual: 45.8 ± 16.7). Additionally, task completion times were shorter (p < 0.01) with magnetic navigation (284.6 ± 80.7 s) compared to manual navigation (411.0 ± 123.7 s).
Conclusion: The findings of this study suggest that remote magnetic navigation using the Navion significantly improves operator experiences in terms of workload and usability, reinforcing the case for wider adoption of well-designed robotic systems in cardiac electrophysiology labs.
{"title":"Investigating workload and usability of remote magnetic navigation for catheter ablation.","authors":"Florian Heemeyer, Leonardo E Guido Lopez, Miguel E Jáuregui Abularach, Beatriz Sanz Verdejo, Quentin Boehler, Oliver Brinkmann, José L Merino, Bradley J Nelson","doi":"10.1007/s11548-025-03558-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03558-z","url":null,"abstract":"<p><strong>Purpose: </strong>Robotic systems for catheter ablation have been in clinical use for many years. While their impact on the clinical outcome and procedure times is well studied, aspects like usability and operator workload have received limited attention in the literature. Reduced workload and stress levels benefit the operator's mental and physical health, and can also lower the risk of errors and ultimately improve patient safety. The aim of this study is to investigate the workload and usability of remote magnetic navigation compared to conventional manual navigation.</p><p><strong>Methods: </strong>We performed a user study with eight electrophysiologists. Each participant performed identical in-vitro navigation tasks replicating those found in pulmonary vein isolation using both manual and magnetic navigation. Magnetic navigation experiments were performed using the Navion, a mobile electromagnetic navigation system.</p><p><strong>Results: </strong>Magnetic navigation significantly improved usability (p < 0.02) and workload (p < 0.01) compared to manual navigation, measured using the System Usability Scale (magnetic: 85.6 ± 9.3 vs. manual: 75.0 ± 17.8) and NASA Task Load Index (magnetic: 72.4 ± 13.5 vs. manual: 45.8 ± 16.7). Additionally, task completion times were shorter (p < 0.01) with magnetic navigation (284.6 ± 80.7 s) compared to manual navigation (411.0 ± 123.7 s).</p><p><strong>Conclusion: </strong>The findings of this study suggest that remote magnetic navigation using the Navion significantly improves operator experiences in terms of workload and usability, reinforcing the case for wider adoption of well-designed robotic systems in cardiac electrophysiology labs.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1007/s11548-025-03557-0
Minghui Zhang, Yun Gu
Purpose: Shape modeling of volumetric medical images plays a crucial role in quantitative analysis and surgical planning for computer-aided diagnosis. However, automatic shape reconstruction from deep learning models often suffers from limited image resolution and the lack of shape prior constraints. This study aims to address these challenges by developing a method that enables reliable and accurate anatomical shape modeling in the continuous space.
Methods: We present the Reliable Shape Interaction with Implicit Template (ReShapeIT) network, which represents anatomical structures using continuous implicit fields rather than discrete voxel grids. The approach combines a category-specific implicit template field with a deformation network to encode anatomical shapes from training shapes. In addition, a Template Interaction Module (TIM) is designed to refine test cases by aligning learned template shapes with instance-specific latent codes.
Results: We evaluated ReShapeIT on three anatomical datasets-Liver, Pancreas, and Lung Lobe. The proposed method outperforms state-of-the-art approaches in 3D shape reconstruction, achieving Chamfer Distance/Earth Mover's Distance scores of 0.225/0.318 for Liver, 0.125/0.067 for Pancreas, and 0.414/0.098 for Lung Lobe.
Conclusion: ReShapeIT provides a reliable and generalizable solution for implicit anatomical shape modeling by leveraging shared template priors and instance-level deformations. The implementation is publicly available at: https://github.com/EndoluminalSurgicalVision-IMR/ReShapeIT .
{"title":"Reshapeit: reliable shape interaction with implicit template for medical anatomy reconstruction.","authors":"Minghui Zhang, Yun Gu","doi":"10.1007/s11548-025-03557-0","DOIUrl":"https://doi.org/10.1007/s11548-025-03557-0","url":null,"abstract":"<p><strong>Purpose: </strong>Shape modeling of volumetric medical images plays a crucial role in quantitative analysis and surgical planning for computer-aided diagnosis. However, automatic shape reconstruction from deep learning models often suffers from limited image resolution and the lack of shape prior constraints. This study aims to address these challenges by developing a method that enables reliable and accurate anatomical shape modeling in the continuous space.</p><p><strong>Methods: </strong>We present the Reliable Shape Interaction with Implicit Template (ReShapeIT) network, which represents anatomical structures using continuous implicit fields rather than discrete voxel grids. The approach combines a category-specific implicit template field with a deformation network to encode anatomical shapes from training shapes. In addition, a Template Interaction Module (TIM) is designed to refine test cases by aligning learned template shapes with instance-specific latent codes.</p><p><strong>Results: </strong>We evaluated ReShapeIT on three anatomical datasets-Liver, Pancreas, and Lung Lobe. The proposed method outperforms state-of-the-art approaches in 3D shape reconstruction, achieving Chamfer Distance/Earth Mover's Distance scores of 0.225/0.318 for Liver, 0.125/0.067 for Pancreas, and 0.414/0.098 for Lung Lobe.</p><p><strong>Conclusion: </strong>ReShapeIT provides a reliable and generalizable solution for implicit anatomical shape modeling by leveraging shared template priors and instance-level deformations. The implementation is publicly available at: https://github.com/EndoluminalSurgicalVision-IMR/ReShapeIT .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1007/s11548-025-03539-2
Melda Yeghaian, Stefano Trebeschi, Marina Herrero-Huertas, Francisco Javier Mendoza Ferradás, Paula Bos, Maarten J A van Alphen, Marcel A J van Gerven, Regina G H Beets-Tan, Zuhir Bodalal, Lilly-Ann van der Velden
Purpose: Accurate prediction of treatment outcomes is crucial for personalized treatment in head and neck squamous cell carcinoma (HNSCC). Beyond one-year survival, assessing long-term enteral nutrition dependence is essential for optimizing patient counseling and resource allocation. This preliminary study aimed to predict one-year survival and feeding tube dependence in surgically treated HNSCC patients using classical machine learning.
Methods: This proof-of-principle retrospective study included 558 surgically treated HNSCC patients. Baseline clinical data, routine blood markers, and MRI-based radiomic features were collected before treatment. Additional postsurgical treatments within one year were also recorded. Random forest classifiers were trained to predict one-year survival and feeding tube dependence. Model explainability was assessed using Shapley Additive exPlanation (SHAP) values.
Results: Using tenfold stratified cross-validation, clinical data showed the highest predictive performance for survival (AUC = 0.75 ± 0.10; p < 0.001). Blood (AUC = 0.67 ± 0.17; p = 0.001) and imaging (AUC = 0.68 ± 0.16; p = 0.26) showed moderate performance, and multimodal integration did not improve predictions (AUC = 0.68 ± 0.16; p = 0.38). For feeding tube dependence, all modalities had low predictive power (AUC ≤ 0.66; p > 0.05). However, postsurgical treatment information outperformed all other modalities (AUC = 0.67 ± 0.07; p = 0.002), but had the lowest predictive value for survival (AUC = 0.57 ± 0.11; p = 0.08).
Conclusion: Clinical data appeared to be the strongest predictor of one-year survival in surgically treated HNSCC, although overall predictive performance was moderate. Postsurgical treatment information played a key role in predicting tube feeding dependence. While multimodal integration did not enhance overall model performance, it showed modest gains for weaker individual modalities, suggesting potential complementarity that warrants further investigation.
{"title":"Machine learning-based treatment outcome prediction in head and neck cancer using integrated noninvasive diagnostics.","authors":"Melda Yeghaian, Stefano Trebeschi, Marina Herrero-Huertas, Francisco Javier Mendoza Ferradás, Paula Bos, Maarten J A van Alphen, Marcel A J van Gerven, Regina G H Beets-Tan, Zuhir Bodalal, Lilly-Ann van der Velden","doi":"10.1007/s11548-025-03539-2","DOIUrl":"https://doi.org/10.1007/s11548-025-03539-2","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate prediction of treatment outcomes is crucial for personalized treatment in head and neck squamous cell carcinoma (HNSCC). Beyond one-year survival, assessing long-term enteral nutrition dependence is essential for optimizing patient counseling and resource allocation. This preliminary study aimed to predict one-year survival and feeding tube dependence in surgically treated HNSCC patients using classical machine learning.</p><p><strong>Methods: </strong>This proof-of-principle retrospective study included 558 surgically treated HNSCC patients. Baseline clinical data, routine blood markers, and MRI-based radiomic features were collected before treatment. Additional postsurgical treatments within one year were also recorded. Random forest classifiers were trained to predict one-year survival and feeding tube dependence. Model explainability was assessed using Shapley Additive exPlanation (SHAP) values.</p><p><strong>Results: </strong>Using tenfold stratified cross-validation, clinical data showed the highest predictive performance for survival (AUC = 0.75 ± 0.10; p < 0.001). Blood (AUC = 0.67 ± 0.17; p = 0.001) and imaging (AUC = 0.68 ± 0.16; p = 0.26) showed moderate performance, and multimodal integration did not improve predictions (AUC = 0.68 ± 0.16; p = 0.38). For feeding tube dependence, all modalities had low predictive power (AUC ≤ 0.66; p > 0.05). However, postsurgical treatment information outperformed all other modalities (AUC = 0.67 ± 0.07; p = 0.002), but had the lowest predictive value for survival (AUC = 0.57 ± 0.11; p = 0.08).</p><p><strong>Conclusion: </strong>Clinical data appeared to be the strongest predictor of one-year survival in surgically treated HNSCC, although overall predictive performance was moderate. Postsurgical treatment information played a key role in predicting tube feeding dependence. While multimodal integration did not enhance overall model performance, it showed modest gains for weaker individual modalities, suggesting potential complementarity that warrants further investigation.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1007/s11548-025-03548-1
A Marasi, D Milesi, D Aquino, F M Doniselli, R Pascuzzo, M Grisoli, A Redaelli, E De Momi
Purpose: Accurate prediction of overall survival (OS) in glioblastoma patients is critical for advancing personalized treatments and improving clinical trial design. Conventional radiomics approaches rely on manually engineered features, which limit their ability to capture complex, high-dimensional imaging patterns. This study employs a deep learning architecture to process MRI data for automated glioma segmentation and feature extraction, leveraging high-level representations from the encoder's latent space.
Methods: Multimodal MRI data from the BraTS2020 dataset and a proprietary dataset from Fondazione IRCCS Istituto Neurologico Carlo Besta (Milan, Italy) were processed independently using a U-Net-like model pre-trained on BraTS2018 and fine-tuned on BraTS2020. Features extracted from the encoder's latent space represented hierarchical imaging patterns. These features were combined with clinical variable (patient's age) and reduced via principal component analysis (PCA) to enhance computational efficiency. Machine learning classifiers-including random forest, XGBoost, and a fully connected neural network-were trained on the reduced feature vectors for OS classification.
Results: In the four-modality BraTS4CH setting, the multi-layer perceptron achieved the best performance (F1 = 0.71, AUC = 0.74, accuracy = 0.71). When limited to two modalities on BraTS2020 (BraTS2CH), MLP again led (F1 = 0.67, AUC = 0.70, accuracy = 0.67). On the IRCCS Besta two-modality cohort (Besta2CH), XGBoost produced the highest F1-score and accuracy (F1 = 0.65, accuracy = 0.66), while MLP obtained the top AUC (0.70). These results are competitive with-and in some metrics exceed-state-of-the-art reports, demonstrating the robustness and scalability of our automated framework relative to traditional radiomics and AI-driven approaches.
Conclusion: Integrating encoder-derived features from multimodal MRI data with clinical variables offers a scalable and effective approach for OS prediction in glioblastoma patients. This study demonstrates the potential of deep learning to address traditional radiomics limitations, paving the way for more precise and personalized prognostic tools.
{"title":"Glioblastoma survival prediction through MRI and clinical data integration with transfer learning.","authors":"A Marasi, D Milesi, D Aquino, F M Doniselli, R Pascuzzo, M Grisoli, A Redaelli, E De Momi","doi":"10.1007/s11548-025-03548-1","DOIUrl":"https://doi.org/10.1007/s11548-025-03548-1","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate prediction of overall survival (OS) in glioblastoma patients is critical for advancing personalized treatments and improving clinical trial design. Conventional radiomics approaches rely on manually engineered features, which limit their ability to capture complex, high-dimensional imaging patterns. This study employs a deep learning architecture to process MRI data for automated glioma segmentation and feature extraction, leveraging high-level representations from the encoder's latent space.</p><p><strong>Methods: </strong>Multimodal MRI data from the BraTS2020 dataset and a proprietary dataset from Fondazione IRCCS Istituto Neurologico Carlo Besta (Milan, Italy) were processed independently using a U-Net-like model pre-trained on BraTS2018 and fine-tuned on BraTS2020. Features extracted from the encoder's latent space represented hierarchical imaging patterns. These features were combined with clinical variable (patient's age) and reduced via principal component analysis (PCA) to enhance computational efficiency. Machine learning classifiers-including random forest, XGBoost, and a fully connected neural network-were trained on the reduced feature vectors for OS classification.</p><p><strong>Results: </strong>In the four-modality BraTS4CH setting, the multi-layer perceptron achieved the best performance (F1 = 0.71, AUC = 0.74, accuracy = 0.71). When limited to two modalities on BraTS2020 (BraTS2CH), MLP again led (F1 = 0.67, AUC = 0.70, accuracy = 0.67). On the IRCCS Besta two-modality cohort (Besta2CH), XGBoost produced the highest F1-score and accuracy (F1 = 0.65, accuracy = 0.66), while MLP obtained the top AUC (0.70). These results are competitive with-and in some metrics exceed-state-of-the-art reports, demonstrating the robustness and scalability of our automated framework relative to traditional radiomics and AI-driven approaches.</p><p><strong>Conclusion: </strong>Integrating encoder-derived features from multimodal MRI data with clinical variables offers a scalable and effective approach for OS prediction in glioblastoma patients. This study demonstrates the potential of deep learning to address traditional radiomics limitations, paving the way for more precise and personalized prognostic tools.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1007/s11548-025-03552-5
Nuno S Rodrigues, Pedro Morais, Lukas R Buschle, Estevão Lima, João L Vilaça
Purpose: Minimally invasive surgical approaches are currently the standard of care for men with prostate cancer, presenting higher rates of erectile function preservation. With these laparoscopic techniques, there is an increasing amount of data and information available. Adaptive systems can play an important role, acting as an intelligent information filter, assuring that all the available information can become useful for the procedure and not overwhelming for the surgeon. Standardizing and structuring the surgical workflow are key requirements for such smart assistants to recognize the different surgical steps through context information about the environment. This work aims to do a detailed characterization of a laparoscopic radical prostatectomy procedure, focusing on the formalization of medical expert knowledge, via surgical process modeling.
Methods: Data were acquired manually, via online and offline observation, and discussion with medical experts. A total of 14 procedures were observed. Both manual laparoscopic radical prostatectomy and robot-assisted laparoscopic prostatectomy were studied. The derived SPM focuses only on the intraoperatory part of the procedure, with constant feedback from the endoscopic camera. For surgery observation, a dedicated Excel template was developed.
Results: The final model is represented in a descriptive and numerical format, combining task description with a workflow diagram arrangement for ease of interpretation. Practical applications of the generated surgical process model are exemplified with the creation of activation trees for surgical phase identification. Anatomical structures are reported for each phase, distinguishing between visible and inferable ones. Additionally, the surgeons involved are identified, surgical instruments, and actions performed in each phase. A total of 11 phases were identified and characterized. Average surgery duration is 87 min.
Conclusion: The generated surgical process model is a first step toward the development of a context-aware surgical assistant and can potentially be used as a roadmap by other research teams, operating room managers and surgical teams.
{"title":"In-depth characterization of a laparoscopic radical prostatectomy procedure based on surgical process modeling.","authors":"Nuno S Rodrigues, Pedro Morais, Lukas R Buschle, Estevão Lima, João L Vilaça","doi":"10.1007/s11548-025-03552-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03552-5","url":null,"abstract":"<p><strong>Purpose: </strong>Minimally invasive surgical approaches are currently the standard of care for men with prostate cancer, presenting higher rates of erectile function preservation. With these laparoscopic techniques, there is an increasing amount of data and information available. Adaptive systems can play an important role, acting as an intelligent information filter, assuring that all the available information can become useful for the procedure and not overwhelming for the surgeon. Standardizing and structuring the surgical workflow are key requirements for such smart assistants to recognize the different surgical steps through context information about the environment. This work aims to do a detailed characterization of a laparoscopic radical prostatectomy procedure, focusing on the formalization of medical expert knowledge, via surgical process modeling.</p><p><strong>Methods: </strong>Data were acquired manually, via online and offline observation, and discussion with medical experts. A total of 14 procedures were observed. Both manual laparoscopic radical prostatectomy and robot-assisted laparoscopic prostatectomy were studied. The derived SPM focuses only on the intraoperatory part of the procedure, with constant feedback from the endoscopic camera. For surgery observation, a dedicated Excel template was developed.</p><p><strong>Results: </strong>The final model is represented in a descriptive and numerical format, combining task description with a workflow diagram arrangement for ease of interpretation. Practical applications of the generated surgical process model are exemplified with the creation of activation trees for surgical phase identification. Anatomical structures are reported for each phase, distinguishing between visible and inferable ones. Additionally, the surgeons involved are identified, surgical instruments, and actions performed in each phase. A total of 11 phases were identified and characterized. Average surgery duration is 87 min.</p><p><strong>Conclusion: </strong>The generated surgical process model is a first step toward the development of a context-aware surgical assistant and can potentially be used as a roadmap by other research teams, operating room managers and surgical teams.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1007/s11548-025-03555-2
Sreeram Kamabattula, Kai Chen, Kiran Bhattacharyya
Purpose: Surgical video review is essential for minimally invasive surgical training, but manual annotation of surgical steps is time-consuming and limits scalability. We propose a weakly supervised pre-training framework that leverages unannotated or heterogeneously labeled surgical videos to improve automated surgical step recognition.
Methods: We evaluate three types of weak labels derived from unannotated datasets: (1) surgical phases from the same or other procedures, (2) surgical steps from different procedure types, and (3) intraoperative time progression. Using datasets from four robotic-assisted procedures (sleeve gastrectomy, hysterectomy, cholecystectomy, and radical prostatectomy), we simulate real-world annotation scarcity by varying the proportion of available step annotations ( 0.25, 0.5, 0.75, 1.0). We benchmark the performance of a 2D CNN model trained with and without weak label pre-training.
Results: Pre-training with surgical phase labels-particularly from the same procedure type (PHASE-WITHIN)-consistently improved step recognition performance, with gains up to 6.4 f1-score points over standard ImageNet-based models under limited annotation conditions ( = 0.25 on SLG). Cross-procedure step pre-training was beneficial for some procedures, and time-based labels provided moderate gains depending on procedure structure. Label efficiency analysis shows the baseline model would require labeling an additional 30-60 videos at = 0.25 to match the performance achieved by the best weak-pretraining strategy across procedures.
Conclusion: Weakly supervised pre-training offers a practical strategy to improve surgical step recognition when annotated data is scarce. This approach can support scalable feedback and assessment in surgical training workflows where comprehensive annotations are infeasible.
{"title":"Weakly supervised pre-training for surgical step recognition using unannotated and heterogeneously labeled videos.","authors":"Sreeram Kamabattula, Kai Chen, Kiran Bhattacharyya","doi":"10.1007/s11548-025-03555-2","DOIUrl":"https://doi.org/10.1007/s11548-025-03555-2","url":null,"abstract":"<p><strong>Purpose: </strong>Surgical video review is essential for minimally invasive surgical training, but manual annotation of surgical steps is time-consuming and limits scalability. We propose a weakly supervised pre-training framework that leverages unannotated or heterogeneously labeled surgical videos to improve automated surgical step recognition.</p><p><strong>Methods: </strong>We evaluate three types of weak labels derived from unannotated datasets: (1) surgical phases from the same or other procedures, (2) surgical steps from different procedure types, and (3) intraoperative time progression. Using datasets from four robotic-assisted procedures (sleeve gastrectomy, hysterectomy, cholecystectomy, and radical prostatectomy), we simulate real-world annotation scarcity by varying the proportion of available step annotations ( <math><mi>α</mi></math> <math><mo>∈</mo></math> 0.25, 0.5, 0.75, 1.0). We benchmark the performance of a 2D CNN model trained with and without weak label pre-training.</p><p><strong>Results: </strong>Pre-training with surgical phase labels-particularly from the same procedure type (PHASE-WITHIN)-consistently improved step recognition performance, with gains up to 6.4 f1-score points over standard ImageNet-based models under limited annotation conditions ( <math><mi>α</mi></math> = 0.25 on SLG). Cross-procedure step pre-training was beneficial for some procedures, and time-based labels provided moderate gains depending on procedure structure. Label efficiency analysis shows the baseline model would require labeling an additional 30-60 videos at <math><mi>α</mi></math> = 0.25 to match the performance achieved by the best weak-pretraining strategy across procedures.</p><p><strong>Conclusion: </strong>Weakly supervised pre-training offers a practical strategy to improve surgical step recognition when annotated data is scarce. This approach can support scalable feedback and assessment in surgical training workflows where comprehensive annotations are infeasible.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Radiofrequency ablation for liver cancer has advanced rapidly. For accurate ultrasound-guided soft-tissue puncture surgery, it is necessary to fuse intraoperative ultrasound images with preoperative computed tomography images. However, the conventional method is difficult to estimate and fuse images accurately. To address this issue, the present study proposes an algorithm for registering cross-source point clouds based on not surface but the geometric features of the vascular point cloud.
Methods: We developed a fusion system that performs cross-source point cloud registration between ultrasound and computed tomography images, extracting the node, skeleton, and geomatic feature of the vascular point cloud. The system completes the fusion process in an average of 14.5 s after acquiring the vascular point clouds via ultrasound.
Results: The experiments were conducted to fuse liver images by the dummy model and the healthy participants, respectively. The results show the proposed method achieved a registration error within 1.4 mm and decreased the target registration error significantly compared to other methods in a liver dummy model registration experiment. Furthermore, the proposed method achieved the averaged RMSE within 2.23 mm in a human liver vascular skeleton.
Conclusion: The study concluded that because the registration method using vascular feature point cloud could realize the rapid and accurate fusion between ultrasound and computed tomography images, the method is useful to apply the real puncture surgery for radiofrequency ablation for liver. In future work, we will evaluate the proposed method by the patients.
{"title":"Point cloud registration algorithm using liver vascular skeleton feature with computed tomography and ultrasonography image fusion.","authors":"Satoshi Miura, Masayuki Nakayama, Kexin Xu, Zhang Bo, Ryoko Kuromatsu, Masahito Nakano, Yu Noda, Takumi Kawaguchi","doi":"10.1007/s11548-025-03496-w","DOIUrl":"10.1007/s11548-025-03496-w","url":null,"abstract":"<p><strong>Purpose: </strong>Radiofrequency ablation for liver cancer has advanced rapidly. For accurate ultrasound-guided soft-tissue puncture surgery, it is necessary to fuse intraoperative ultrasound images with preoperative computed tomography images. However, the conventional method is difficult to estimate and fuse images accurately. To address this issue, the present study proposes an algorithm for registering cross-source point clouds based on not surface but the geometric features of the vascular point cloud.</p><p><strong>Methods: </strong>We developed a fusion system that performs cross-source point cloud registration between ultrasound and computed tomography images, extracting the node, skeleton, and geomatic feature of the vascular point cloud. The system completes the fusion process in an average of 14.5 s after acquiring the vascular point clouds via ultrasound.</p><p><strong>Results: </strong>The experiments were conducted to fuse liver images by the dummy model and the healthy participants, respectively. The results show the proposed method achieved a registration error within 1.4 mm and decreased the target registration error significantly compared to other methods in a liver dummy model registration experiment. Furthermore, the proposed method achieved the averaged RMSE within 2.23 mm in a human liver vascular skeleton.</p><p><strong>Conclusion: </strong>The study concluded that because the registration method using vascular feature point cloud could realize the rapid and accurate fusion between ultrasound and computed tomography images, the method is useful to apply the real puncture surgery for radiofrequency ablation for liver. In future work, we will evaluate the proposed method by the patients.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2469-2478"},"PeriodicalIF":2.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12689734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: For diffusion MRI (dMRI) parameter estimation, machine-learning approaches have shown promising results so far including the synthetic Q-space learning (synQSL) based on regressor training with only synthetic data. In this study, we aimed at the development of a new method named synthetic X-Q space learning (synXQSL) to improve robustness and investigated the basic characteristics.
Methods: For training data, local parameter patterns of 3 × 3 voxels were synthesized by a linear combination of six bases, in which parameters are estimated at the center voxel. We prepared three types of local patterns by choosing the number of bases: flat, linear and quadratic. Then, at each location of 3 × 3 voxels, signal values of the diffusion-weighted image were computed by the signal model equation for diffusional kurtosis imaging and Rician noise simulation. The multi-layer perceptron was used for parameter estimation and was trained for each parameter with various noise levels. The level is controlled by a noise ratio defined as a fraction of the standard deviation in the Rician noise distribution normalized by the average b = 0 signal values. Experiments for visual and quantitative validation were performed with synthetic data, a digital phantom and clinical breast datasets in comparison with the previous methods.
Results: By using synthetic datasets, synXQSL outperformed synQSL in the parameter estimation of noisy data sets. Through the digital phantom experiments, the combination of synXQSL bases yields different results and a quadratic pattern could be the reasonable choice. The clinical data experiments indicate that synXQSL suppresses noises in estimated parameter maps and consequently brings higher contrast.
Conclusion: The basic characteristics of synXQSL were investigated by using various types of datasets. The results indicate that synXQSL with the appropriate choice of bases in training data synthesis has the potential to improve dMRI parameters in noisy datasets.
{"title":"Synthetic X-Q space learning for diffusion MRI parameter estimation: a pilot study in breast DKI.","authors":"Yoshitaka Masutani, Kousei Konya, Erina Kato, Naoko Mori, Hideki Ota, Shunji Mugikura, Kei Takase, Yuki Ichinoseki","doi":"10.1007/s11548-025-03550-7","DOIUrl":"10.1007/s11548-025-03550-7","url":null,"abstract":"<p><strong>Purpose: </strong>For diffusion MRI (dMRI) parameter estimation, machine-learning approaches have shown promising results so far including the synthetic Q-space learning (synQSL) based on regressor training with only synthetic data. In this study, we aimed at the development of a new method named synthetic X-Q space learning (synXQSL) to improve robustness and investigated the basic characteristics.</p><p><strong>Methods: </strong>For training data, local parameter patterns of 3 × 3 voxels were synthesized by a linear combination of six bases, in which parameters are estimated at the center voxel. We prepared three types of local patterns by choosing the number of bases: flat, linear and quadratic. Then, at each location of 3 × 3 voxels, signal values of the diffusion-weighted image were computed by the signal model equation for diffusional kurtosis imaging and Rician noise simulation. The multi-layer perceptron was used for parameter estimation and was trained for each parameter with various noise levels. The level is controlled by a noise ratio defined as a fraction of the standard deviation in the Rician noise distribution normalized by the average b = 0 signal values. Experiments for visual and quantitative validation were performed with synthetic data, a digital phantom and clinical breast datasets in comparison with the previous methods.</p><p><strong>Results: </strong>By using synthetic datasets, synXQSL outperformed synQSL in the parameter estimation of noisy data sets. Through the digital phantom experiments, the combination of synXQSL bases yields different results and a quadratic pattern could be the reasonable choice. The clinical data experiments indicate that synXQSL suppresses noises in estimated parameter maps and consequently brings higher contrast.</p><p><strong>Conclusion: </strong>The basic characteristics of synXQSL were investigated by using various types of datasets. The results indicate that synXQSL with the appropriate choice of bases in training data synthesis has the potential to improve dMRI parameters in noisy datasets.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2423-2435"},"PeriodicalIF":2.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12689713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}