Pub Date : 2025-11-03DOI: 10.1007/s11548-025-03524-9
Mario A Cypko, Muhammad Agus Salim, Aditya Kumar, Leonard Berliner, Andreas Dietz, Matthaeus Stoehr, Oliver Amft
Purpose: Bayesian networks (BNs) are valuable for clinical decision support due to their transparency and interpretability. However, BN modelling requires considerable manual effort. This study explores how integrating large language models (LLMs) with retrieval-augmented generation (RAG) can improve BN modelling by increasing efficiency, reducing cognitive workload, and ensuring accuracy.
Methods: We developed a web-based BN modelling service that integrates an LLM-RAG pipeline. A fine-tuned GTE-Large embedding model was employed for knowledge retrieval, optimised through recursive chunking and query expansion. To ensure accurate BN suggestions, we defined a causal structure for medical idioms by unifying existing BN frameworks. GPT-4 and Mixtral 8x7B were used to handle complex data interpretation and to generate modelling suggestions, respectively. A user study with four clinicians assessed usability, retrieval accuracy, and cognitive workload using NASA-TLX. The study demonstrated the system's potential for efficient and clinically relevant BN modelling.
Results: The RAG pipeline improved retrieval accuracy and answer relevance. Recursive chunking with the fine-tuned embedding model GTE-Large achieved the highest retrieval accuracy score (0.9). Query expansion and Hyde optimisation enhanced retrieval accuracy for semantic chunking (0.75 to 0.85). Responses maintained high faithfulness ( 0.9). However, the LLM occasionally failed to adhere to predefined causal structures and medical idioms. All clinicians, regardless of BN experience, created comprehensive models within one hour. Experienced clinicians produced more complex models, but occasionally introduced causality errors, while less experienced users adhered more accurately to predefined structures. The tool reduced cognitive workload (2/7 NASA-TLX) and was described as intuitive, although workflow interruptions and minor technical issues highlighted areas for improvement.
Conclusion: Integrating LLM-RAG into BN modelling enhances efficiency and accuracy. Future work may focus on automated preprocessing, refinements of the user interface, and extending the RAG pipeline with validation steps and external biomedical sources. Generative AI holds promise for expert-driven knowledge modelling.
{"title":"Large language models with retrieval-augmented generation enhance expert modelling of Bayesian network for clinical decision support.","authors":"Mario A Cypko, Muhammad Agus Salim, Aditya Kumar, Leonard Berliner, Andreas Dietz, Matthaeus Stoehr, Oliver Amft","doi":"10.1007/s11548-025-03524-9","DOIUrl":"https://doi.org/10.1007/s11548-025-03524-9","url":null,"abstract":"<p><strong>Purpose: </strong>Bayesian networks (BNs) are valuable for clinical decision support due to their transparency and interpretability. However, BN modelling requires considerable manual effort. This study explores how integrating large language models (LLMs) with retrieval-augmented generation (RAG) can improve BN modelling by increasing efficiency, reducing cognitive workload, and ensuring accuracy.</p><p><strong>Methods: </strong>We developed a web-based BN modelling service that integrates an LLM-RAG pipeline. A fine-tuned GTE-Large embedding model was employed for knowledge retrieval, optimised through recursive chunking and query expansion. To ensure accurate BN suggestions, we defined a causal structure for medical idioms by unifying existing BN frameworks. GPT-4 and Mixtral 8x7B were used to handle complex data interpretation and to generate modelling suggestions, respectively. A user study with four clinicians assessed usability, retrieval accuracy, and cognitive workload using NASA-TLX. The study demonstrated the system's potential for efficient and clinically relevant BN modelling.</p><p><strong>Results: </strong>The RAG pipeline improved retrieval accuracy and answer relevance. Recursive chunking with the fine-tuned embedding model GTE-Large achieved the highest retrieval accuracy score (0.9). Query expansion and Hyde optimisation enhanced retrieval accuracy for semantic chunking (0.75 to 0.85). Responses maintained high faithfulness ( <math><mo>≥</mo></math> 0.9). However, the LLM occasionally failed to adhere to predefined causal structures and medical idioms. All clinicians, regardless of BN experience, created comprehensive models within one hour. Experienced clinicians produced more complex models, but occasionally introduced causality errors, while less experienced users adhered more accurately to predefined structures. The tool reduced cognitive workload (2/7 NASA-TLX) and was described as intuitive, although workflow interruptions and minor technical issues highlighted areas for improvement.</p><p><strong>Conclusion: </strong>Integrating LLM-RAG into BN modelling enhances efficiency and accuracy. Future work may focus on automated preprocessing, refinements of the user interface, and extending the RAG pipeline with validation steps and external biomedical sources. Generative AI holds promise for expert-driven knowledge modelling.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1007/s11548-024-03270-4
Rebekka Peter, Sofia Moreira, Eleonora Tagliabue, Matthias Hillenbrand, Rita G Nunes, Franziska Mathis-Ullrich
{"title":"Correction to: Stereo reconstruction from microscopic images for computer-assisted ophthalmic surgery.","authors":"Rebekka Peter, Sofia Moreira, Eleonora Tagliabue, Matthias Hillenbrand, Rita G Nunes, Franziska Mathis-Ullrich","doi":"10.1007/s11548-024-03270-4","DOIUrl":"10.1007/s11548-024-03270-4","url":null,"abstract":"","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2367-2369"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575599/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142717688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-19DOI: 10.1007/s11548-025-03401-5
Hizirwan S Salim, Abdullah Thabit, Sem Hoogteijling, Maryse A van 't Klooster, Theo van Walsum, Maeike Zijlmans, Mohamed Benmahdjoub
Purpose: Epilepsy surgery is a potential curative treatment for people with focal epilepsy. Intraoperative electrocorticogram (ioECoG) recordings from the brain guide neurosurgeons during resection. Accurate localization of epileptic activity and thus the ioECoG grids is critical for successful outcomes. We aim to develop and evaluate the feasibility of a novel method for localizing small, deformable objects using augmented reality (AR) head-mounted displays (HMDs) and artificial intelligence (AI). AR HMDs combine cameras and patient overlay visualization in a compact design.
Methods: We developed an image processing method for the HoloLens 2 to localize a 64-electrode ioECoG grid even when individual electrodes are indistinguishable due to low resolution. The method combines object detection, super-resolution, and pose estimation AI models with stereo triangulation. A synthetic dataset of 90,000 images trained the super-resolution and pose estimation models. The system was tested in a controlled environment against an optical tracker as ground truth. Accuracy was evaluated at distances between 40 and 90 cm.
Results: The system achieved sub-5 mm accuracy in localizing the ioECoG grid at distances shorter than 60 cm. At 40 cm, the accuracy remained below 2 mm, with an average standard deviation of less than 0.5 mm. At 60 cm the method processed on average 24 stereo frames per second.
Conclusion: This study demonstrates the feasibility of localizing small, deformable objects like ioECoG grids using AR HMDs. While results indicate clinically acceptable accuracy, further research is needed to validate the method in clinical environments and assess its impact on surgical precision and outcomes.
{"title":"Super-resolution for localizing electrode grids as small, deformable objects during epilepsy surgery using augmented reality headsets.","authors":"Hizirwan S Salim, Abdullah Thabit, Sem Hoogteijling, Maryse A van 't Klooster, Theo van Walsum, Maeike Zijlmans, Mohamed Benmahdjoub","doi":"10.1007/s11548-025-03401-5","DOIUrl":"10.1007/s11548-025-03401-5","url":null,"abstract":"<p><strong>Purpose: </strong>Epilepsy surgery is a potential curative treatment for people with focal epilepsy. Intraoperative electrocorticogram (ioECoG) recordings from the brain guide neurosurgeons during resection. Accurate localization of epileptic activity and thus the ioECoG grids is critical for successful outcomes. We aim to develop and evaluate the feasibility of a novel method for localizing small, deformable objects using augmented reality (AR) head-mounted displays (HMDs) and artificial intelligence (AI). AR HMDs combine cameras and patient overlay visualization in a compact design.</p><p><strong>Methods: </strong>We developed an image processing method for the HoloLens 2 to localize a 64-electrode ioECoG grid even when individual electrodes are indistinguishable due to low resolution. The method combines object detection, super-resolution, and pose estimation AI models with stereo triangulation. A synthetic dataset of 90,000 images trained the super-resolution and pose estimation models. The system was tested in a controlled environment against an optical tracker as ground truth. Accuracy was evaluated at distances between 40 and 90 cm.</p><p><strong>Results: </strong>The system achieved sub-5 mm accuracy in localizing the ioECoG grid at distances shorter than 60 cm. At 40 cm, the accuracy remained below 2 mm, with an average standard deviation of less than 0.5 mm. At 60 cm the method processed on average 24 stereo frames per second.</p><p><strong>Conclusion: </strong>This study demonstrates the feasibility of localizing small, deformable objects like ioECoG grids using AR HMDs. While results indicate clinically acceptable accuracy, further research is needed to validate the method in clinical environments and assess its impact on surgical precision and outcomes.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2319-2327"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575507/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144327675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-20DOI: 10.1007/s11548-025-03437-7
Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Ning Zhong, Zhen Li, Xiaoxiao Yang, Hongliang Ren
Purpose: The intricate nature of endoscopic surgical environments poses significant challenges for the task of dissection zone segmentation. Specifically, the boundaries between different tissue types lack clarity, which can result in significant segmentation errors, as the models may misidentify or overlook object edges altogether. Thus, the goal of this work is to achieve the precise dissection zone suggestion under these challenges during endoscopic submucosal dissection (ESD) procedures and enhance the overall safety of ESD.
Methods: We introduce a prompted-based dissection zone segmentation (PDZSeg) model, aimed at segmenting dissection zones and specifically designed to incorporate different visual prompts, such as scribbles and bounding boxes. Our approach overlays these visual cues directly onto the images, utilizing fine-tuning of the foundational model on a specialized dataset created to handle diverse visual prompt instructions. This shift toward more flexible input methods is intended to significantly improve both the performance of dissection zone segmentation and the overall user experience.
Results: We evaluate our approaches using the three experimental setups: in-domain evaluation, evaluation under variability in visual prompts availability, and robustness assessment. By validating our approaches on the ESD-DZSeg dataset, specifically focused on the dissection zone segmentation task of ESD, our experimental results show that our solution outperforms state-of-the-art segmentation methods for this task. To the best of our knowledge, this is the first study to incorporate visual prompt design in dissection zone segmentation.
Conclusion: We introduce the prompted-based dissection zone segmentation (PDZSeg) model, which is specifically designed for dissection zone segmentation and can effectively utilize various visual prompts, including scribbles and bounding boxes. This model improves segmentation performance and enhances user experience by integrating a specialized dataset with a novel visual referral method that optimizes the architecture and boosts the effectiveness of dissection zone suggestions. Furthermore, we present the ESD-DZSeg dataset for robot-assisted endoscopic submucosal dissection (ESD), which serves as a benchmark for assessing dissection zone suggestions and visual prompt interpretation, thus laying the groundwork for future research in this field. Our code is available at https://github.com/FrankMOWJ/PDZSeg .
{"title":"PDZSeg: adapting the foundation model for dissection zone segmentation with visual prompts in robot-assisted endoscopic submucosal dissection.","authors":"Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Ning Zhong, Zhen Li, Xiaoxiao Yang, Hongliang Ren","doi":"10.1007/s11548-025-03437-7","DOIUrl":"10.1007/s11548-025-03437-7","url":null,"abstract":"<p><strong>Purpose: </strong>The intricate nature of endoscopic surgical environments poses significant challenges for the task of dissection zone segmentation. Specifically, the boundaries between different tissue types lack clarity, which can result in significant segmentation errors, as the models may misidentify or overlook object edges altogether. Thus, the goal of this work is to achieve the precise dissection zone suggestion under these challenges during endoscopic submucosal dissection (ESD) procedures and enhance the overall safety of ESD.</p><p><strong>Methods: </strong>We introduce a prompted-based dissection zone segmentation (PDZSeg) model, aimed at segmenting dissection zones and specifically designed to incorporate different visual prompts, such as scribbles and bounding boxes. Our approach overlays these visual cues directly onto the images, utilizing fine-tuning of the foundational model on a specialized dataset created to handle diverse visual prompt instructions. This shift toward more flexible input methods is intended to significantly improve both the performance of dissection zone segmentation and the overall user experience.</p><p><strong>Results: </strong>We evaluate our approaches using the three experimental setups: in-domain evaluation, evaluation under variability in visual prompts availability, and robustness assessment. By validating our approaches on the ESD-DZSeg dataset, specifically focused on the dissection zone segmentation task of ESD, our experimental results show that our solution outperforms state-of-the-art segmentation methods for this task. To the best of our knowledge, this is the first study to incorporate visual prompt design in dissection zone segmentation.</p><p><strong>Conclusion: </strong>We introduce the prompted-based dissection zone segmentation (PDZSeg) model, which is specifically designed for dissection zone segmentation and can effectively utilize various visual prompts, including scribbles and bounding boxes. This model improves segmentation performance and enhances user experience by integrating a specialized dataset with a novel visual referral method that optimizes the architecture and boosts the effectiveness of dissection zone suggestions. Furthermore, we present the ESD-DZSeg dataset for robot-assisted endoscopic submucosal dissection (ESD), which serves as a benchmark for assessing dissection zone suggestions and visual prompt interpretation, thus laying the groundwork for future research in this field. Our code is available at https://github.com/FrankMOWJ/PDZSeg .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2335-2344"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575525/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-07DOI: 10.1007/s11548-025-03432-y
Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin
Purpose: Planning highly complex surgeries in virtual reality (VR) provides a user-friendly and natural way to navigate volumetric medical data and can improve the sense of depth and scale. Using ray marching-based volume rendering to display the data has several benefits over traditional mesh-based rendering, such as offering a more accurate and detailed visualization without the need for prior segmentation and meshing. However, volume rendering can be difficult to extend to support multiple intersecting volumes in a scene while maintaining a high enough update rate for a comfortable user experience in VR.
Methods: Upon loading a volume, a rough ad hoc segmentation is performed using a motion-tracked controller. The segmentation is not used to extract a surface mesh and does not need to precisely define the exact surfaces to be rendered, as it only serves to separate the volume into individual sub-volumes, which are rendered in multiple, consecutive volume rendering passes. For each pass, the ray lengths are written into the camera depth buffer at early ray termination and read in subsequent passes to ensure correct occlusion between individual volumes.
Results: We evaluate the performance of the multi-volume renderer using three different use cases and corresponding datasets. We show that the presented approach can avoid dropped frames at the typical update rate of 90 frames per second of a desktop-based VR system and, therefore, provide a comfortable user experience even in the presence of more than twenty individual volumes.
Conclusion: Our proof-of-concept implementation shows the feasibility of VR-based surgical planning systems, which require dynamic and direct manipulation of the original volumetric data without sacrificing rendering performance and user experience.
{"title":"Multi-volume rendering using depth buffers for surgical planning in virtual reality.","authors":"Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin","doi":"10.1007/s11548-025-03432-y","DOIUrl":"10.1007/s11548-025-03432-y","url":null,"abstract":"<p><strong>Purpose: </strong>Planning highly complex surgeries in virtual reality (VR) provides a user-friendly and natural way to navigate volumetric medical data and can improve the sense of depth and scale. Using ray marching-based volume rendering to display the data has several benefits over traditional mesh-based rendering, such as offering a more accurate and detailed visualization without the need for prior segmentation and meshing. However, volume rendering can be difficult to extend to support multiple intersecting volumes in a scene while maintaining a high enough update rate for a comfortable user experience in VR.</p><p><strong>Methods: </strong>Upon loading a volume, a rough ad hoc segmentation is performed using a motion-tracked controller. The segmentation is not used to extract a surface mesh and does not need to precisely define the exact surfaces to be rendered, as it only serves to separate the volume into individual sub-volumes, which are rendered in multiple, consecutive volume rendering passes. For each pass, the ray lengths are written into the camera depth buffer at early ray termination and read in subsequent passes to ensure correct occlusion between individual volumes.</p><p><strong>Results: </strong>We evaluate the performance of the multi-volume renderer using three different use cases and corresponding datasets. We show that the presented approach can avoid dropped frames at the typical update rate of 90 frames per second of a desktop-based VR system and, therefore, provide a comfortable user experience even in the presence of more than twenty individual volumes.</p><p><strong>Conclusion: </strong>Our proof-of-concept implementation shows the feasibility of VR-based surgical planning systems, which require dynamic and direct manipulation of the original volumetric data without sacrificing rendering performance and user experience.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2251-2258"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-07-01DOI: 10.1007/s11548-025-03455-5
Karin A Olthof, Matteo Fusaglia, Anne G den Hartog, Niels F M Kok, Theo J M Ruers, Koert F D Kuhlmann
Purpose: Understanding patient-specific liver anatomy is crucial for patient safety and achieving complete treatment of all tumors during surgery. This study evaluates the impact of the use of patient-specific 3D liver models and surgical navigation on procedural complexity in open liver surgery.
Methods: Patients with colorectal liver metastases scheduled for open liver surgery were included between June 2022 and October 2023 at the Netherlands Cancer Institute. Patient-specific 3D liver models could be used upon request during the surgical procedure. Subsequently, surgeons could request additional surgical navigation by landmark registration using an electromagnetically tracked ultrasound transducer. Postoperatively, surgeons assessed the impact of the use of the model and navigation on procedural complexity on a scale from 1 to 10.
Results: 35 patients were included in this study, with a median number of 8 (ranging from 3 to 25) tumors. 3D models were utilized in all procedures. Additional navigation was requested in 21/35 of patients to improve intraoperative planning and tumor localization. The mean procedural complexity score with navigation was 4.3 (95% CI [3.7, 5.0]), compared to 7.8 (95% CI [6.6, 9.0]) with the 3D model alone. Both visualization methods improved lesion localization and provided better anatomical insight.
Conclusion: 3D models and surgical navigation significantly reduce the complexity of open liver surgery, especially in patients with bilobar disease. These tools enhance intraoperative decision-making and may lead to better surgical outcomes. The stepwise implementation of the visualization techniques in this study underscores the added benefit of surgical navigation beyond 3D modeling alone, supporting its potential for broader clinical implementation.
{"title":"The impact of 3-dimensional models and surgical navigation for open liver surgery.","authors":"Karin A Olthof, Matteo Fusaglia, Anne G den Hartog, Niels F M Kok, Theo J M Ruers, Koert F D Kuhlmann","doi":"10.1007/s11548-025-03455-5","DOIUrl":"10.1007/s11548-025-03455-5","url":null,"abstract":"<p><strong>Purpose: </strong>Understanding patient-specific liver anatomy is crucial for patient safety and achieving complete treatment of all tumors during surgery. This study evaluates the impact of the use of patient-specific 3D liver models and surgical navigation on procedural complexity in open liver surgery.</p><p><strong>Methods: </strong>Patients with colorectal liver metastases scheduled for open liver surgery were included between June 2022 and October 2023 at the Netherlands Cancer Institute. Patient-specific 3D liver models could be used upon request during the surgical procedure. Subsequently, surgeons could request additional surgical navigation by landmark registration using an electromagnetically tracked ultrasound transducer. Postoperatively, surgeons assessed the impact of the use of the model and navigation on procedural complexity on a scale from 1 to 10.</p><p><strong>Results: </strong>35 patients were included in this study, with a median number of 8 (ranging from 3 to 25) tumors. 3D models were utilized in all procedures. Additional navigation was requested in 21/35 of patients to improve intraoperative planning and tumor localization. The mean procedural complexity score with navigation was 4.3 (95% CI [3.7, 5.0]), compared to 7.8 (95% CI [6.6, 9.0]) with the 3D model alone. Both visualization methods improved lesion localization and provided better anatomical insight.</p><p><strong>Conclusion: </strong>3D models and surgical navigation significantly reduce the complexity of open liver surgery, especially in patients with bilobar disease. These tools enhance intraoperative decision-making and may lead to better surgical outcomes. The stepwise implementation of the visualization techniques in this study underscores the added benefit of surgical navigation beyond 3D modeling alone, supporting its potential for broader clinical implementation.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2213-2218"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575497/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144546069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-04-25DOI: 10.1007/s11548-025-03366-5
Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden
Objective: Video offers an accessible method for automated surgical skill evaluation; however, many platforms still rely on traditional six-degree-of-freedom (6-DOF) tracking systems, which can be costly, cumbersome, and challenging to apply clinically. This study aims to demonstrate that trainee skill in cataract surgery can be assessed effectively using only object detection from monocular surgical microscope video.
Methods: One ophthalmologist and four residents performed cataract surgery on a simulated eye five times each, generating 25 recordings. Recordings included both the surgical microscope video and 6-DOF instrument tracking data. Videos were graded by two expert ophthalmologists using the ICO-OSCAR:SICS rubric. We computed motion-based metrics using both object detection from video and 6-DOF tracking. We first examined correlations between each metric and expert scores for each rubric criteria. Then, using these findings, we trained an ordinal regression model to predict scores from each tracking modality and compared correlation strengths with expert scores.
Results: Metrics from object detection generally showed stronger correlations with expert scores than 6-DOF tracking. For score prediction, 6-DOF tracking showed no significant advantage, while scores predicted from object detection achieved significantly stronger correlations with expert scores for four scoring criteria.
Conclusion: Our results indicate that skill assessment from monocular surgical microscope video can match, and in some cases exceed, the correlation strengths of 6-DOF tracking assessments. This finding supports the feasibility of using object detection for skill assessment without additional hardware.
{"title":"Objective skill assessment for cataract surgery from surgical microscope video.","authors":"Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden","doi":"10.1007/s11548-025-03366-5","DOIUrl":"10.1007/s11548-025-03366-5","url":null,"abstract":"<p><strong>Objective: </strong>Video offers an accessible method for automated surgical skill evaluation; however, many platforms still rely on traditional six-degree-of-freedom (6-DOF) tracking systems, which can be costly, cumbersome, and challenging to apply clinically. This study aims to demonstrate that trainee skill in cataract surgery can be assessed effectively using only object detection from monocular surgical microscope video.</p><p><strong>Methods: </strong>One ophthalmologist and four residents performed cataract surgery on a simulated eye five times each, generating 25 recordings. Recordings included both the surgical microscope video and 6-DOF instrument tracking data. Videos were graded by two expert ophthalmologists using the ICO-OSCAR:SICS rubric. We computed motion-based metrics using both object detection from video and 6-DOF tracking. We first examined correlations between each metric and expert scores for each rubric criteria. Then, using these findings, we trained an ordinal regression model to predict scores from each tracking modality and compared correlation strengths with expert scores.</p><p><strong>Results: </strong>Metrics from object detection generally showed stronger correlations with expert scores than 6-DOF tracking. For score prediction, 6-DOF tracking showed no significant advantage, while scores predicted from object detection achieved significantly stronger correlations with expert scores for four scoring criteria.</p><p><strong>Conclusion: </strong>Our results indicate that skill assessment from monocular surgical microscope video can match, and in some cases exceed, the correlation strengths of 6-DOF tracking assessments. This finding supports the feasibility of using object detection for skill assessment without additional hardware.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2219-2230"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-30DOI: 10.1007/s11548-025-03426-w
Ping-Cheng Ku, Mingxu Liu, Robert Grupp, Andrew Harris, Julius K Oni, Simon C Mears, Alejandro Martin-Gomez, Mehran Armand
Purpose: Soft tissue pathologies and bone defects are not easily visible in intra-operative fluoroscopic images; therefore, we develop an end-to-end MRI-to-fluoroscopic image registration framework, aiming to enhance intra-operative visualization for surgeons during orthopedic procedures.
Methods: The proposed framework utilizes deep learning to segment MRI scans and generate synthetic CT (sCT) volumes. These sCT volumes are then used to produce digitally reconstructed radiographs (DRRs), enabling 2D/3D registration with intra-operative fluoroscopic images. The framework's performance was validated through simulation and cadaver studies for core decompression (CD) surgery, focusing on the registration accuracy of femur and pelvic regions.
Results: The framework achieved a mean translational registration accuracy of 2.4 ± 1.0 mm and rotational accuracy of 1.6 ± for the femoral region in cadaver studies. The method successfully enabled intra-operative visualization of necrotic lesions that were not visible on conventional fluoroscopic images, marking a significant advancement in image guidance for femur and pelvic surgeries.
Conclusion: The MRI-to-fluoroscopic registration framework offers a novel approach to image guidance in orthopedic surgeries, exclusively using MRI without the need for CT scans. This approach enhances the visualization of soft tissues and bone defects, reduces radiation exposure, and provides a safer, more effective alternative for intra-operative surgical guidance.
{"title":"End-to-end 2D/3D registration from pre-operative MRI to intra-operative fluoroscopy for orthopedic procedures.","authors":"Ping-Cheng Ku, Mingxu Liu, Robert Grupp, Andrew Harris, Julius K Oni, Simon C Mears, Alejandro Martin-Gomez, Mehran Armand","doi":"10.1007/s11548-025-03426-w","DOIUrl":"10.1007/s11548-025-03426-w","url":null,"abstract":"<p><strong>Purpose: </strong>Soft tissue pathologies and bone defects are not easily visible in intra-operative fluoroscopic images; therefore, we develop an end-to-end MRI-to-fluoroscopic image registration framework, aiming to enhance intra-operative visualization for surgeons during orthopedic procedures.</p><p><strong>Methods: </strong>The proposed framework utilizes deep learning to segment MRI scans and generate synthetic CT (sCT) volumes. These sCT volumes are then used to produce digitally reconstructed radiographs (DRRs), enabling 2D/3D registration with intra-operative fluoroscopic images. The framework's performance was validated through simulation and cadaver studies for core decompression (CD) surgery, focusing on the registration accuracy of femur and pelvic regions.</p><p><strong>Results: </strong>The framework achieved a mean translational registration accuracy of 2.4 ± 1.0 mm and rotational accuracy of 1.6 ± <math><mrow><mn>0</mn> <mo>.</mo> <msup><mn>8</mn> <mo>∘</mo></msup> </mrow> </math> for the femoral region in cadaver studies. The method successfully enabled intra-operative visualization of necrotic lesions that were not visible on conventional fluoroscopic images, marking a significant advancement in image guidance for femur and pelvic surgeries.</p><p><strong>Conclusion: </strong>The MRI-to-fluoroscopic registration framework offers a novel approach to image guidance in orthopedic surgeries, exclusively using MRI without the need for CT scans. This approach enhances the visualization of soft tissues and bone defects, reduces radiation exposure, and provides a safer, more effective alternative for intra-operative surgical guidance.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2355-2366"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-01DOI: 10.1007/s11548-025-03379-0
Wenhao Gu, Justin D Opfermann, Jonathan Knopf, Axel Krieger, Mathias Unberath
Purpose: Mixed reality for surgical navigation is an emerging tool for precision surgery. Achieving reliable surgical guidance hinges on robust tracking of the mixed reality device relative to patient anatomy. Contemporary approaches either introduce bulky fiducials that need to be invasively attached to the anatomy or make strong assumptions about the patient remaining stationary.
Methods: We present an approach to anatomy tracking that relies on biocompatible near-infrared fluorescent (NIRF) dots. Dots are quickly placed on the anatomy intra-operatively and the pose is tracked reliably via PnP-type methods. We demonstrate the potential of our NIRF dots approach to track patient movements after image registration on a 3D printed model, simulating an image-guided navigation process with a tablet-based mixed reality scenario.
Results: The dot-based pose tracking demonstrated an average accuracy of 1.13 mm in translation and 0.69 degrees in rotation under static conditions, and 1.39 mm/1.10 degrees, respectively, under dynamic conditions.
Conclusion: Our results are promising and encourage further research in the viability of integrating NIRF dots in mixed reality surgical navigation. These biocompatible dots may allow for reliable tracking of patient motion post-registration, providing a convenient alternative to invasive marker arrays. While our initial tests used a tablet, adaptation to head-mounted displays is plausible with suitable sensors.
{"title":"Near-infrared beacons: tracking anatomy with biocompatible fluorescent dots for mixed reality surgical navigation.","authors":"Wenhao Gu, Justin D Opfermann, Jonathan Knopf, Axel Krieger, Mathias Unberath","doi":"10.1007/s11548-025-03379-0","DOIUrl":"10.1007/s11548-025-03379-0","url":null,"abstract":"<p><strong>Purpose: </strong>Mixed reality for surgical navigation is an emerging tool for precision surgery. Achieving reliable surgical guidance hinges on robust tracking of the mixed reality device relative to patient anatomy. Contemporary approaches either introduce bulky fiducials that need to be invasively attached to the anatomy or make strong assumptions about the patient remaining stationary.</p><p><strong>Methods: </strong>We present an approach to anatomy tracking that relies on biocompatible near-infrared fluorescent (NIRF) dots. Dots are quickly placed on the anatomy intra-operatively and the pose is tracked reliably via PnP-type methods. We demonstrate the potential of our NIRF dots approach to track patient movements after image registration on a 3D printed model, simulating an image-guided navigation process with a tablet-based mixed reality scenario.</p><p><strong>Results: </strong>The dot-based pose tracking demonstrated an average accuracy of 1.13 mm in translation and 0.69 degrees in rotation under static conditions, and 1.39 mm/1.10 degrees, respectively, under dynamic conditions.</p><p><strong>Conclusion: </strong>Our results are promising and encourage further research in the viability of integrating NIRF dots in mixed reality surgical navigation. These biocompatible dots may allow for reliable tracking of patient motion post-registration, providing a convenient alternative to invasive marker arrays. While our initial tests used a tablet, adaptation to head-mounted displays is plausible with suitable sensors.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2309-2318"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144063150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-26DOI: 10.1007/s11548-025-03422-0
Lunchi Guo, Dennis Trujillo, James R Duncan, M Allan Thomas
Purpose: Accurate patient dosimetry estimates from fluoroscopically-guided interventions (FGIs) are hindered by limited knowledge of the specific anatomy that was irradiated. Current methods use data reported by the equipment to estimate the patient anatomy exposed during each irradiation event. We propose a deep learning algorithm to automatically match 2D fluoroscopic images with corresponding anatomical regions in computational phantoms, enabling more precise patient dose estimates.
Methods: Our method involves two main steps: (1) simulating 2D fluoroscopic images, and (2) developing a deep learning algorithm to predict anatomical coordinates from these images. For part (1), we utilized DeepDRR for fast and realistic simulation of 2D x-ray images from 3D computed tomography datasets. We generated a diverse set of simulated fluoroscopic images from various regions with different field sizes. In part (2), we employed a Residual Neural Network (ResNet) architecture combined with metadata processing to effectively integrate patient-specific information (age and gender) to learn the transformation between 2D images and specific anatomical coordinates in each representative phantom. For the Modified ResNet model, we defined an allowable error range of ± 10 mm.
Results: The proposed method achieved over 90% of predictions within ± 10 mm, with strong alignment between predicted and true coordinates as confirmed by Bland-Altman analysis. Most errors were within ± 2%, with outliers beyond ± 5% primarily in Z-coordinates for infant phantoms due to their limited representation in the training data. These findings highlight the model's accuracy and its potential for precise spatial localization, while emphasizing the need for improved performance in specific anatomical regions.
Conclusion: In this work, a comprehensive simulated 2D fluoroscopy image dataset was developed, addressing the scarcity of real clinical datasets and enabling effective training of deep-learning models. The modified ResNet successfully achieved precise prediction of anatomical coordinates from the simulated fluoroscopic images, enabling the goal of more accurate patient-specific dosimetry.
{"title":"Training a deep learning model to predict the anatomy irradiated in fluoroscopic x-ray images.","authors":"Lunchi Guo, Dennis Trujillo, James R Duncan, M Allan Thomas","doi":"10.1007/s11548-025-03422-0","DOIUrl":"10.1007/s11548-025-03422-0","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate patient dosimetry estimates from fluoroscopically-guided interventions (FGIs) are hindered by limited knowledge of the specific anatomy that was irradiated. Current methods use data reported by the equipment to estimate the patient anatomy exposed during each irradiation event. We propose a deep learning algorithm to automatically match 2D fluoroscopic images with corresponding anatomical regions in computational phantoms, enabling more precise patient dose estimates.</p><p><strong>Methods: </strong>Our method involves two main steps: (1) simulating 2D fluoroscopic images, and (2) developing a deep learning algorithm to predict anatomical coordinates from these images. For part (1), we utilized DeepDRR for fast and realistic simulation of 2D x-ray images from 3D computed tomography datasets. We generated a diverse set of simulated fluoroscopic images from various regions with different field sizes. In part (2), we employed a Residual Neural Network (ResNet) architecture combined with metadata processing to effectively integrate patient-specific information (age and gender) to learn the transformation between 2D images and specific anatomical coordinates in each representative phantom. For the Modified ResNet model, we defined an allowable error range of ± 10 mm.</p><p><strong>Results: </strong>The proposed method achieved over 90% of predictions within ± 10 mm, with strong alignment between predicted and true coordinates as confirmed by Bland-Altman analysis. Most errors were within ± 2%, with outliers beyond ± 5% primarily in Z-coordinates for infant phantoms due to their limited representation in the training data. These findings highlight the model's accuracy and its potential for precise spatial localization, while emphasizing the need for improved performance in specific anatomical regions.</p><p><strong>Conclusion: </strong>In this work, a comprehensive simulated 2D fluoroscopy image dataset was developed, addressing the scarcity of real clinical datasets and enabling effective training of deep-learning models. The modified ResNet successfully achieved precise prediction of anatomical coordinates from the simulated fluoroscopic images, enabling the goal of more accurate patient-specific dosimetry.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2345-2353"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144144393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}