Pub Date : 2025-11-01Epub Date: 2025-06-20DOI: 10.1007/s11548-025-03437-7
Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Ning Zhong, Zhen Li, Xiaoxiao Yang, Hongliang Ren
Purpose: The intricate nature of endoscopic surgical environments poses significant challenges for the task of dissection zone segmentation. Specifically, the boundaries between different tissue types lack clarity, which can result in significant segmentation errors, as the models may misidentify or overlook object edges altogether. Thus, the goal of this work is to achieve the precise dissection zone suggestion under these challenges during endoscopic submucosal dissection (ESD) procedures and enhance the overall safety of ESD.
Methods: We introduce a prompted-based dissection zone segmentation (PDZSeg) model, aimed at segmenting dissection zones and specifically designed to incorporate different visual prompts, such as scribbles and bounding boxes. Our approach overlays these visual cues directly onto the images, utilizing fine-tuning of the foundational model on a specialized dataset created to handle diverse visual prompt instructions. This shift toward more flexible input methods is intended to significantly improve both the performance of dissection zone segmentation and the overall user experience.
Results: We evaluate our approaches using the three experimental setups: in-domain evaluation, evaluation under variability in visual prompts availability, and robustness assessment. By validating our approaches on the ESD-DZSeg dataset, specifically focused on the dissection zone segmentation task of ESD, our experimental results show that our solution outperforms state-of-the-art segmentation methods for this task. To the best of our knowledge, this is the first study to incorporate visual prompt design in dissection zone segmentation.
Conclusion: We introduce the prompted-based dissection zone segmentation (PDZSeg) model, which is specifically designed for dissection zone segmentation and can effectively utilize various visual prompts, including scribbles and bounding boxes. This model improves segmentation performance and enhances user experience by integrating a specialized dataset with a novel visual referral method that optimizes the architecture and boosts the effectiveness of dissection zone suggestions. Furthermore, we present the ESD-DZSeg dataset for robot-assisted endoscopic submucosal dissection (ESD), which serves as a benchmark for assessing dissection zone suggestions and visual prompt interpretation, thus laying the groundwork for future research in this field. Our code is available at https://github.com/FrankMOWJ/PDZSeg .
{"title":"PDZSeg: adapting the foundation model for dissection zone segmentation with visual prompts in robot-assisted endoscopic submucosal dissection.","authors":"Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Ning Zhong, Zhen Li, Xiaoxiao Yang, Hongliang Ren","doi":"10.1007/s11548-025-03437-7","DOIUrl":"10.1007/s11548-025-03437-7","url":null,"abstract":"<p><strong>Purpose: </strong>The intricate nature of endoscopic surgical environments poses significant challenges for the task of dissection zone segmentation. Specifically, the boundaries between different tissue types lack clarity, which can result in significant segmentation errors, as the models may misidentify or overlook object edges altogether. Thus, the goal of this work is to achieve the precise dissection zone suggestion under these challenges during endoscopic submucosal dissection (ESD) procedures and enhance the overall safety of ESD.</p><p><strong>Methods: </strong>We introduce a prompted-based dissection zone segmentation (PDZSeg) model, aimed at segmenting dissection zones and specifically designed to incorporate different visual prompts, such as scribbles and bounding boxes. Our approach overlays these visual cues directly onto the images, utilizing fine-tuning of the foundational model on a specialized dataset created to handle diverse visual prompt instructions. This shift toward more flexible input methods is intended to significantly improve both the performance of dissection zone segmentation and the overall user experience.</p><p><strong>Results: </strong>We evaluate our approaches using the three experimental setups: in-domain evaluation, evaluation under variability in visual prompts availability, and robustness assessment. By validating our approaches on the ESD-DZSeg dataset, specifically focused on the dissection zone segmentation task of ESD, our experimental results show that our solution outperforms state-of-the-art segmentation methods for this task. To the best of our knowledge, this is the first study to incorporate visual prompt design in dissection zone segmentation.</p><p><strong>Conclusion: </strong>We introduce the prompted-based dissection zone segmentation (PDZSeg) model, which is specifically designed for dissection zone segmentation and can effectively utilize various visual prompts, including scribbles and bounding boxes. This model improves segmentation performance and enhances user experience by integrating a specialized dataset with a novel visual referral method that optimizes the architecture and boosts the effectiveness of dissection zone suggestions. Furthermore, we present the ESD-DZSeg dataset for robot-assisted endoscopic submucosal dissection (ESD), which serves as a benchmark for assessing dissection zone suggestions and visual prompt interpretation, thus laying the groundwork for future research in this field. Our code is available at https://github.com/FrankMOWJ/PDZSeg .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2335-2344"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575525/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-07DOI: 10.1007/s11548-025-03432-y
Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin
Purpose: Planning highly complex surgeries in virtual reality (VR) provides a user-friendly and natural way to navigate volumetric medical data and can improve the sense of depth and scale. Using ray marching-based volume rendering to display the data has several benefits over traditional mesh-based rendering, such as offering a more accurate and detailed visualization without the need for prior segmentation and meshing. However, volume rendering can be difficult to extend to support multiple intersecting volumes in a scene while maintaining a high enough update rate for a comfortable user experience in VR.
Methods: Upon loading a volume, a rough ad hoc segmentation is performed using a motion-tracked controller. The segmentation is not used to extract a surface mesh and does not need to precisely define the exact surfaces to be rendered, as it only serves to separate the volume into individual sub-volumes, which are rendered in multiple, consecutive volume rendering passes. For each pass, the ray lengths are written into the camera depth buffer at early ray termination and read in subsequent passes to ensure correct occlusion between individual volumes.
Results: We evaluate the performance of the multi-volume renderer using three different use cases and corresponding datasets. We show that the presented approach can avoid dropped frames at the typical update rate of 90 frames per second of a desktop-based VR system and, therefore, provide a comfortable user experience even in the presence of more than twenty individual volumes.
Conclusion: Our proof-of-concept implementation shows the feasibility of VR-based surgical planning systems, which require dynamic and direct manipulation of the original volumetric data without sacrificing rendering performance and user experience.
{"title":"Multi-volume rendering using depth buffers for surgical planning in virtual reality.","authors":"Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin","doi":"10.1007/s11548-025-03432-y","DOIUrl":"10.1007/s11548-025-03432-y","url":null,"abstract":"<p><strong>Purpose: </strong>Planning highly complex surgeries in virtual reality (VR) provides a user-friendly and natural way to navigate volumetric medical data and can improve the sense of depth and scale. Using ray marching-based volume rendering to display the data has several benefits over traditional mesh-based rendering, such as offering a more accurate and detailed visualization without the need for prior segmentation and meshing. However, volume rendering can be difficult to extend to support multiple intersecting volumes in a scene while maintaining a high enough update rate for a comfortable user experience in VR.</p><p><strong>Methods: </strong>Upon loading a volume, a rough ad hoc segmentation is performed using a motion-tracked controller. The segmentation is not used to extract a surface mesh and does not need to precisely define the exact surfaces to be rendered, as it only serves to separate the volume into individual sub-volumes, which are rendered in multiple, consecutive volume rendering passes. For each pass, the ray lengths are written into the camera depth buffer at early ray termination and read in subsequent passes to ensure correct occlusion between individual volumes.</p><p><strong>Results: </strong>We evaluate the performance of the multi-volume renderer using three different use cases and corresponding datasets. We show that the presented approach can avoid dropped frames at the typical update rate of 90 frames per second of a desktop-based VR system and, therefore, provide a comfortable user experience even in the presence of more than twenty individual volumes.</p><p><strong>Conclusion: </strong>Our proof-of-concept implementation shows the feasibility of VR-based surgical planning systems, which require dynamic and direct manipulation of the original volumetric data without sacrificing rendering performance and user experience.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2251-2258"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-07-01DOI: 10.1007/s11548-025-03455-5
Karin A Olthof, Matteo Fusaglia, Anne G den Hartog, Niels F M Kok, Theo J M Ruers, Koert F D Kuhlmann
Purpose: Understanding patient-specific liver anatomy is crucial for patient safety and achieving complete treatment of all tumors during surgery. This study evaluates the impact of the use of patient-specific 3D liver models and surgical navigation on procedural complexity in open liver surgery.
Methods: Patients with colorectal liver metastases scheduled for open liver surgery were included between June 2022 and October 2023 at the Netherlands Cancer Institute. Patient-specific 3D liver models could be used upon request during the surgical procedure. Subsequently, surgeons could request additional surgical navigation by landmark registration using an electromagnetically tracked ultrasound transducer. Postoperatively, surgeons assessed the impact of the use of the model and navigation on procedural complexity on a scale from 1 to 10.
Results: 35 patients were included in this study, with a median number of 8 (ranging from 3 to 25) tumors. 3D models were utilized in all procedures. Additional navigation was requested in 21/35 of patients to improve intraoperative planning and tumor localization. The mean procedural complexity score with navigation was 4.3 (95% CI [3.7, 5.0]), compared to 7.8 (95% CI [6.6, 9.0]) with the 3D model alone. Both visualization methods improved lesion localization and provided better anatomical insight.
Conclusion: 3D models and surgical navigation significantly reduce the complexity of open liver surgery, especially in patients with bilobar disease. These tools enhance intraoperative decision-making and may lead to better surgical outcomes. The stepwise implementation of the visualization techniques in this study underscores the added benefit of surgical navigation beyond 3D modeling alone, supporting its potential for broader clinical implementation.
{"title":"The impact of 3-dimensional models and surgical navigation for open liver surgery.","authors":"Karin A Olthof, Matteo Fusaglia, Anne G den Hartog, Niels F M Kok, Theo J M Ruers, Koert F D Kuhlmann","doi":"10.1007/s11548-025-03455-5","DOIUrl":"10.1007/s11548-025-03455-5","url":null,"abstract":"<p><strong>Purpose: </strong>Understanding patient-specific liver anatomy is crucial for patient safety and achieving complete treatment of all tumors during surgery. This study evaluates the impact of the use of patient-specific 3D liver models and surgical navigation on procedural complexity in open liver surgery.</p><p><strong>Methods: </strong>Patients with colorectal liver metastases scheduled for open liver surgery were included between June 2022 and October 2023 at the Netherlands Cancer Institute. Patient-specific 3D liver models could be used upon request during the surgical procedure. Subsequently, surgeons could request additional surgical navigation by landmark registration using an electromagnetically tracked ultrasound transducer. Postoperatively, surgeons assessed the impact of the use of the model and navigation on procedural complexity on a scale from 1 to 10.</p><p><strong>Results: </strong>35 patients were included in this study, with a median number of 8 (ranging from 3 to 25) tumors. 3D models were utilized in all procedures. Additional navigation was requested in 21/35 of patients to improve intraoperative planning and tumor localization. The mean procedural complexity score with navigation was 4.3 (95% CI [3.7, 5.0]), compared to 7.8 (95% CI [6.6, 9.0]) with the 3D model alone. Both visualization methods improved lesion localization and provided better anatomical insight.</p><p><strong>Conclusion: </strong>3D models and surgical navigation significantly reduce the complexity of open liver surgery, especially in patients with bilobar disease. These tools enhance intraoperative decision-making and may lead to better surgical outcomes. The stepwise implementation of the visualization techniques in this study underscores the added benefit of surgical navigation beyond 3D modeling alone, supporting its potential for broader clinical implementation.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2213-2218"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575497/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144546069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-04-25DOI: 10.1007/s11548-025-03366-5
Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden
Objective: Video offers an accessible method for automated surgical skill evaluation; however, many platforms still rely on traditional six-degree-of-freedom (6-DOF) tracking systems, which can be costly, cumbersome, and challenging to apply clinically. This study aims to demonstrate that trainee skill in cataract surgery can be assessed effectively using only object detection from monocular surgical microscope video.
Methods: One ophthalmologist and four residents performed cataract surgery on a simulated eye five times each, generating 25 recordings. Recordings included both the surgical microscope video and 6-DOF instrument tracking data. Videos were graded by two expert ophthalmologists using the ICO-OSCAR:SICS rubric. We computed motion-based metrics using both object detection from video and 6-DOF tracking. We first examined correlations between each metric and expert scores for each rubric criteria. Then, using these findings, we trained an ordinal regression model to predict scores from each tracking modality and compared correlation strengths with expert scores.
Results: Metrics from object detection generally showed stronger correlations with expert scores than 6-DOF tracking. For score prediction, 6-DOF tracking showed no significant advantage, while scores predicted from object detection achieved significantly stronger correlations with expert scores for four scoring criteria.
Conclusion: Our results indicate that skill assessment from monocular surgical microscope video can match, and in some cases exceed, the correlation strengths of 6-DOF tracking assessments. This finding supports the feasibility of using object detection for skill assessment without additional hardware.
{"title":"Objective skill assessment for cataract surgery from surgical microscope video.","authors":"Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden","doi":"10.1007/s11548-025-03366-5","DOIUrl":"10.1007/s11548-025-03366-5","url":null,"abstract":"<p><strong>Objective: </strong>Video offers an accessible method for automated surgical skill evaluation; however, many platforms still rely on traditional six-degree-of-freedom (6-DOF) tracking systems, which can be costly, cumbersome, and challenging to apply clinically. This study aims to demonstrate that trainee skill in cataract surgery can be assessed effectively using only object detection from monocular surgical microscope video.</p><p><strong>Methods: </strong>One ophthalmologist and four residents performed cataract surgery on a simulated eye five times each, generating 25 recordings. Recordings included both the surgical microscope video and 6-DOF instrument tracking data. Videos were graded by two expert ophthalmologists using the ICO-OSCAR:SICS rubric. We computed motion-based metrics using both object detection from video and 6-DOF tracking. We first examined correlations between each metric and expert scores for each rubric criteria. Then, using these findings, we trained an ordinal regression model to predict scores from each tracking modality and compared correlation strengths with expert scores.</p><p><strong>Results: </strong>Metrics from object detection generally showed stronger correlations with expert scores than 6-DOF tracking. For score prediction, 6-DOF tracking showed no significant advantage, while scores predicted from object detection achieved significantly stronger correlations with expert scores for four scoring criteria.</p><p><strong>Conclusion: </strong>Our results indicate that skill assessment from monocular surgical microscope video can match, and in some cases exceed, the correlation strengths of 6-DOF tracking assessments. This finding supports the feasibility of using object detection for skill assessment without additional hardware.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2219-2230"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-30DOI: 10.1007/s11548-025-03426-w
Ping-Cheng Ku, Mingxu Liu, Robert Grupp, Andrew Harris, Julius K Oni, Simon C Mears, Alejandro Martin-Gomez, Mehran Armand
Purpose: Soft tissue pathologies and bone defects are not easily visible in intra-operative fluoroscopic images; therefore, we develop an end-to-end MRI-to-fluoroscopic image registration framework, aiming to enhance intra-operative visualization for surgeons during orthopedic procedures.
Methods: The proposed framework utilizes deep learning to segment MRI scans and generate synthetic CT (sCT) volumes. These sCT volumes are then used to produce digitally reconstructed radiographs (DRRs), enabling 2D/3D registration with intra-operative fluoroscopic images. The framework's performance was validated through simulation and cadaver studies for core decompression (CD) surgery, focusing on the registration accuracy of femur and pelvic regions.
Results: The framework achieved a mean translational registration accuracy of 2.4 ± 1.0 mm and rotational accuracy of 1.6 ± for the femoral region in cadaver studies. The method successfully enabled intra-operative visualization of necrotic lesions that were not visible on conventional fluoroscopic images, marking a significant advancement in image guidance for femur and pelvic surgeries.
Conclusion: The MRI-to-fluoroscopic registration framework offers a novel approach to image guidance in orthopedic surgeries, exclusively using MRI without the need for CT scans. This approach enhances the visualization of soft tissues and bone defects, reduces radiation exposure, and provides a safer, more effective alternative for intra-operative surgical guidance.
{"title":"End-to-end 2D/3D registration from pre-operative MRI to intra-operative fluoroscopy for orthopedic procedures.","authors":"Ping-Cheng Ku, Mingxu Liu, Robert Grupp, Andrew Harris, Julius K Oni, Simon C Mears, Alejandro Martin-Gomez, Mehran Armand","doi":"10.1007/s11548-025-03426-w","DOIUrl":"10.1007/s11548-025-03426-w","url":null,"abstract":"<p><strong>Purpose: </strong>Soft tissue pathologies and bone defects are not easily visible in intra-operative fluoroscopic images; therefore, we develop an end-to-end MRI-to-fluoroscopic image registration framework, aiming to enhance intra-operative visualization for surgeons during orthopedic procedures.</p><p><strong>Methods: </strong>The proposed framework utilizes deep learning to segment MRI scans and generate synthetic CT (sCT) volumes. These sCT volumes are then used to produce digitally reconstructed radiographs (DRRs), enabling 2D/3D registration with intra-operative fluoroscopic images. The framework's performance was validated through simulation and cadaver studies for core decompression (CD) surgery, focusing on the registration accuracy of femur and pelvic regions.</p><p><strong>Results: </strong>The framework achieved a mean translational registration accuracy of 2.4 ± 1.0 mm and rotational accuracy of 1.6 ± <math><mrow><mn>0</mn> <mo>.</mo> <msup><mn>8</mn> <mo>∘</mo></msup> </mrow> </math> for the femoral region in cadaver studies. The method successfully enabled intra-operative visualization of necrotic lesions that were not visible on conventional fluoroscopic images, marking a significant advancement in image guidance for femur and pelvic surgeries.</p><p><strong>Conclusion: </strong>The MRI-to-fluoroscopic registration framework offers a novel approach to image guidance in orthopedic surgeries, exclusively using MRI without the need for CT scans. This approach enhances the visualization of soft tissues and bone defects, reduces radiation exposure, and provides a safer, more effective alternative for intra-operative surgical guidance.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2355-2366"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-01DOI: 10.1007/s11548-025-03379-0
Wenhao Gu, Justin D Opfermann, Jonathan Knopf, Axel Krieger, Mathias Unberath
Purpose: Mixed reality for surgical navigation is an emerging tool for precision surgery. Achieving reliable surgical guidance hinges on robust tracking of the mixed reality device relative to patient anatomy. Contemporary approaches either introduce bulky fiducials that need to be invasively attached to the anatomy or make strong assumptions about the patient remaining stationary.
Methods: We present an approach to anatomy tracking that relies on biocompatible near-infrared fluorescent (NIRF) dots. Dots are quickly placed on the anatomy intra-operatively and the pose is tracked reliably via PnP-type methods. We demonstrate the potential of our NIRF dots approach to track patient movements after image registration on a 3D printed model, simulating an image-guided navigation process with a tablet-based mixed reality scenario.
Results: The dot-based pose tracking demonstrated an average accuracy of 1.13 mm in translation and 0.69 degrees in rotation under static conditions, and 1.39 mm/1.10 degrees, respectively, under dynamic conditions.
Conclusion: Our results are promising and encourage further research in the viability of integrating NIRF dots in mixed reality surgical navigation. These biocompatible dots may allow for reliable tracking of patient motion post-registration, providing a convenient alternative to invasive marker arrays. While our initial tests used a tablet, adaptation to head-mounted displays is plausible with suitable sensors.
{"title":"Near-infrared beacons: tracking anatomy with biocompatible fluorescent dots for mixed reality surgical navigation.","authors":"Wenhao Gu, Justin D Opfermann, Jonathan Knopf, Axel Krieger, Mathias Unberath","doi":"10.1007/s11548-025-03379-0","DOIUrl":"10.1007/s11548-025-03379-0","url":null,"abstract":"<p><strong>Purpose: </strong>Mixed reality for surgical navigation is an emerging tool for precision surgery. Achieving reliable surgical guidance hinges on robust tracking of the mixed reality device relative to patient anatomy. Contemporary approaches either introduce bulky fiducials that need to be invasively attached to the anatomy or make strong assumptions about the patient remaining stationary.</p><p><strong>Methods: </strong>We present an approach to anatomy tracking that relies on biocompatible near-infrared fluorescent (NIRF) dots. Dots are quickly placed on the anatomy intra-operatively and the pose is tracked reliably via PnP-type methods. We demonstrate the potential of our NIRF dots approach to track patient movements after image registration on a 3D printed model, simulating an image-guided navigation process with a tablet-based mixed reality scenario.</p><p><strong>Results: </strong>The dot-based pose tracking demonstrated an average accuracy of 1.13 mm in translation and 0.69 degrees in rotation under static conditions, and 1.39 mm/1.10 degrees, respectively, under dynamic conditions.</p><p><strong>Conclusion: </strong>Our results are promising and encourage further research in the viability of integrating NIRF dots in mixed reality surgical navigation. These biocompatible dots may allow for reliable tracking of patient motion post-registration, providing a convenient alternative to invasive marker arrays. While our initial tests used a tablet, adaptation to head-mounted displays is plausible with suitable sensors.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2309-2318"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144063150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-26DOI: 10.1007/s11548-025-03422-0
Lunchi Guo, Dennis Trujillo, James R Duncan, M Allan Thomas
Purpose: Accurate patient dosimetry estimates from fluoroscopically-guided interventions (FGIs) are hindered by limited knowledge of the specific anatomy that was irradiated. Current methods use data reported by the equipment to estimate the patient anatomy exposed during each irradiation event. We propose a deep learning algorithm to automatically match 2D fluoroscopic images with corresponding anatomical regions in computational phantoms, enabling more precise patient dose estimates.
Methods: Our method involves two main steps: (1) simulating 2D fluoroscopic images, and (2) developing a deep learning algorithm to predict anatomical coordinates from these images. For part (1), we utilized DeepDRR for fast and realistic simulation of 2D x-ray images from 3D computed tomography datasets. We generated a diverse set of simulated fluoroscopic images from various regions with different field sizes. In part (2), we employed a Residual Neural Network (ResNet) architecture combined with metadata processing to effectively integrate patient-specific information (age and gender) to learn the transformation between 2D images and specific anatomical coordinates in each representative phantom. For the Modified ResNet model, we defined an allowable error range of ± 10 mm.
Results: The proposed method achieved over 90% of predictions within ± 10 mm, with strong alignment between predicted and true coordinates as confirmed by Bland-Altman analysis. Most errors were within ± 2%, with outliers beyond ± 5% primarily in Z-coordinates for infant phantoms due to their limited representation in the training data. These findings highlight the model's accuracy and its potential for precise spatial localization, while emphasizing the need for improved performance in specific anatomical regions.
Conclusion: In this work, a comprehensive simulated 2D fluoroscopy image dataset was developed, addressing the scarcity of real clinical datasets and enabling effective training of deep-learning models. The modified ResNet successfully achieved precise prediction of anatomical coordinates from the simulated fluoroscopic images, enabling the goal of more accurate patient-specific dosimetry.
{"title":"Training a deep learning model to predict the anatomy irradiated in fluoroscopic x-ray images.","authors":"Lunchi Guo, Dennis Trujillo, James R Duncan, M Allan Thomas","doi":"10.1007/s11548-025-03422-0","DOIUrl":"10.1007/s11548-025-03422-0","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate patient dosimetry estimates from fluoroscopically-guided interventions (FGIs) are hindered by limited knowledge of the specific anatomy that was irradiated. Current methods use data reported by the equipment to estimate the patient anatomy exposed during each irradiation event. We propose a deep learning algorithm to automatically match 2D fluoroscopic images with corresponding anatomical regions in computational phantoms, enabling more precise patient dose estimates.</p><p><strong>Methods: </strong>Our method involves two main steps: (1) simulating 2D fluoroscopic images, and (2) developing a deep learning algorithm to predict anatomical coordinates from these images. For part (1), we utilized DeepDRR for fast and realistic simulation of 2D x-ray images from 3D computed tomography datasets. We generated a diverse set of simulated fluoroscopic images from various regions with different field sizes. In part (2), we employed a Residual Neural Network (ResNet) architecture combined with metadata processing to effectively integrate patient-specific information (age and gender) to learn the transformation between 2D images and specific anatomical coordinates in each representative phantom. For the Modified ResNet model, we defined an allowable error range of ± 10 mm.</p><p><strong>Results: </strong>The proposed method achieved over 90% of predictions within ± 10 mm, with strong alignment between predicted and true coordinates as confirmed by Bland-Altman analysis. Most errors were within ± 2%, with outliers beyond ± 5% primarily in Z-coordinates for infant phantoms due to their limited representation in the training data. These findings highlight the model's accuracy and its potential for precise spatial localization, while emphasizing the need for improved performance in specific anatomical regions.</p><p><strong>Conclusion: </strong>In this work, a comprehensive simulated 2D fluoroscopy image dataset was developed, addressing the scarcity of real clinical datasets and enabling effective training of deep-learning models. The modified ResNet successfully achieved precise prediction of anatomical coordinates from the simulated fluoroscopic images, enabling the goal of more accurate patient-specific dosimetry.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2345-2353"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144144393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Augmented reality (AR) technology holds significant promise for enhancing surgical navigation in needle-based procedures such as biopsies and ablations. However, most existing AR systems rely on patient-specific markers, which disrupt clinical workflows and require time-consuming preoperative calibrations, thereby hindering operational efficiency and precision.
Methods: We developed a novel multi-camera AR navigation system that eliminates the need for patient-specific markers by utilizing ceiling-mounted markers mapped to fixed medical imaging devices. A hierarchical optimization framework integrates both marker mapping and multi-camera calibration. Deep learning techniques are employed to enhance marker detection and registration accuracy. Additionally, a vision-based pose compensation method is implemented to mitigate errors caused by patient movement, improving overall positional accuracy.
Results: Validation through phantom experiments and simulated clinical scenarios demonstrated an average puncture accuracy of 3.72 ± 1.21 mm. The system reduced needle placement time by 20 s compared to traditional marker-based methods. It also effectively corrected errors induced by patient movement, with a mean positional error of 0.38 pixels and an angular deviation of 0.51 . These results highlight the system's precision, adaptability, and reliability in realistic surgical conditions.
Conclusion: This marker-free AR guidance system significantly streamlines surgical workflows while enhancing needle navigation accuracy. Its simplicity, cost-effectiveness, and adaptability make it an ideal solution for both high- and low-resource clinical environments, offering the potential for improved precision, reduced procedural time, and better patient outcomes.
{"title":"Efficient needle guidance: multi-camera augmented reality navigation without patient-specific calibration.","authors":"Yizhi Wei, Bingyu Huang, Bolin Zhao, Zhengyu Lin, Steven Zhiying Zhou","doi":"10.1007/s11548-025-03477-z","DOIUrl":"10.1007/s11548-025-03477-z","url":null,"abstract":"<p><strong>Purpose: </strong>Augmented reality (AR) technology holds significant promise for enhancing surgical navigation in needle-based procedures such as biopsies and ablations. However, most existing AR systems rely on patient-specific markers, which disrupt clinical workflows and require time-consuming preoperative calibrations, thereby hindering operational efficiency and precision.</p><p><strong>Methods: </strong>We developed a novel multi-camera AR navigation system that eliminates the need for patient-specific markers by utilizing ceiling-mounted markers mapped to fixed medical imaging devices. A hierarchical optimization framework integrates both marker mapping and multi-camera calibration. Deep learning techniques are employed to enhance marker detection and registration accuracy. Additionally, a vision-based pose compensation method is implemented to mitigate errors caused by patient movement, improving overall positional accuracy.</p><p><strong>Results: </strong>Validation through phantom experiments and simulated clinical scenarios demonstrated an average puncture accuracy of 3.72 ± 1.21 mm. The system reduced needle placement time by 20 s compared to traditional marker-based methods. It also effectively corrected errors induced by patient movement, with a mean positional error of 0.38 pixels and an angular deviation of 0.51 <math><mmultiscripts><mrow></mrow> <mrow></mrow> <mo>∘</mo></mmultiscripts> </math> . These results highlight the system's precision, adaptability, and reliability in realistic surgical conditions.</p><p><strong>Conclusion: </strong>This marker-free AR guidance system significantly streamlines surgical workflows while enhancing needle navigation accuracy. Its simplicity, cost-effectiveness, and adaptability make it an ideal solution for both high- and low-resource clinical environments, offering the potential for improved precision, reduced procedural time, and better patient outcomes.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2281-2291"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144621053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-09-18DOI: 10.1007/s11548-025-03479-x
Xinfeng Xiao, Shijun Li, Wei Yu
Purpose: Depression is a psychological disorder that has vital implications for society's health. So, it is important to develop a model that aids in effective and accurate depression diagnosis. This paper proposes a Dynamic Convolutional Encoder Model based on a Temporal Circular Residual Convolutional Network (DCEM-TCRCN), a novel approach for diagnosing depression using wearable Internet-of-Things sensors.
Methods: DCEM integrates Mobile Inverted Bottleneck Convolution (MBConv) blocks with Dynamic Convolution (DConv) to maximize feature extraction and allow the system to react to input changes and effectively extract depression-correlated patterns. The TCRCN model improves the performance using circular dilated convolution to address long-range temporal relations and eliminate boundary effects. Temporal attention mechanisms deal with important patterns in the data, while weight normalization, GELU activation, and dropout assure stability, regularization, and convergence.
Results: The proposed system applies physiological information acquired from wearable sensors, including heart rate variability and electrodermal activity. Preprocessing tasks like one-hot encoding and data normalization normalize inputs to enable successful feature extraction. Dual fully connected layers perform classifications using pooled learned representations to make accurate predictions regarding depression states.
Conclusion: Experimental analysis on the Depression Dataset confirmed the improved performance of the DCEM-TCRCN model with an accuracy of 98.88%, precision of 97.76%, recall of 98.21%, and a Cohen-Kappa score of 97.99%. The findings confirm the efficacy, trustworthiness, and stability of the model, making it usable for real-time psychological health monitoring.
{"title":"DCEM-TCRCN: an innovative approach to depression detection using wearable IoT devices and deep learning.","authors":"Xinfeng Xiao, Shijun Li, Wei Yu","doi":"10.1007/s11548-025-03479-x","DOIUrl":"10.1007/s11548-025-03479-x","url":null,"abstract":"<p><strong>Purpose: </strong>Depression is a psychological disorder that has vital implications for society's health. So, it is important to develop a model that aids in effective and accurate depression diagnosis. This paper proposes a Dynamic Convolutional Encoder Model based on a Temporal Circular Residual Convolutional Network (DCEM-TCRCN), a novel approach for diagnosing depression using wearable Internet-of-Things sensors.</p><p><strong>Methods: </strong>DCEM integrates Mobile Inverted Bottleneck Convolution (MBConv) blocks with Dynamic Convolution (DConv) to maximize feature extraction and allow the system to react to input changes and effectively extract depression-correlated patterns. The TCRCN model improves the performance using circular dilated convolution to address long-range temporal relations and eliminate boundary effects. Temporal attention mechanisms deal with important patterns in the data, while weight normalization, GELU activation, and dropout assure stability, regularization, and convergence.</p><p><strong>Results: </strong>The proposed system applies physiological information acquired from wearable sensors, including heart rate variability and electrodermal activity. Preprocessing tasks like one-hot encoding and data normalization normalize inputs to enable successful feature extraction. Dual fully connected layers perform classifications using pooled learned representations to make accurate predictions regarding depression states.</p><p><strong>Conclusion: </strong>Experimental analysis on the Depression Dataset confirmed the improved performance of the DCEM-TCRCN model with an accuracy of 98.88%, precision of 97.76%, recall of 98.21%, and a Cohen-Kappa score of 97.99%. The findings confirm the efficacy, trustworthiness, and stability of the model, making it usable for real-time psychological health monitoring.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2301-2308"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-10-23DOI: 10.1007/s11548-025-03533-8
Heinz U Lemke
Purpose: Model-guided medicine (MGM) represents a paradigm shift in clinical practice, emphasizing the integration of computational models to support diagnosis, therapy planning and individualized patient care. The general and/or specific domain models, on which recommendations, decisions or actions of these systems are based, should reflect in their model identity certificate (MIC) the level of model relevance, truthfulness and transparency.
Methods: Methods and tools for building models and their corresponding templates for a MIC in the domains of radiology and surgery should be drawn from relevant elements of a model science, specifically from mathematical modelling methods (e.g. for model truthfulness) and modelling informatics tools (e.g. for model transparency). Other elements or MIC classes to consider may include ethics, human-AI model interaction and model control.
Results: A generic template of a MIC with classes, attributes and examples for the general domain of health care is being proposed as an initial attempt to gain experience with the complexity of the problems associated with enhancing trustworthiness in models. This template is intended to serve as a framework for an instance of a specific template for robot assisted intervention for hepatocellular cancer within the domain of interventional radiology (work-in-progress).
Conclusion: Gaining trustworthiness in intelligent systems based on models and related AI tools is a challenging undertaking and raises many critical questions, specifically those related to ascertain model relevance, truthfulness and transparency. The healthcare system, in particular the interventional medical disciplines, will have to be concerned about the availability of digital identity certificates to enable the control for these systems and related artefacts, e.g. digital twins, avatars, diagnostic and interventional robots, or intelligent agents.
{"title":"Enhancing trustworthiness in model-guided medicine with a model identity certificate (MIC): starting with interventional disciplines.","authors":"Heinz U Lemke","doi":"10.1007/s11548-025-03533-8","DOIUrl":"10.1007/s11548-025-03533-8","url":null,"abstract":"<p><strong>Purpose: </strong>Model-guided medicine (MGM) represents a paradigm shift in clinical practice, emphasizing the integration of computational models to support diagnosis, therapy planning and individualized patient care. The general and/or specific domain models, on which recommendations, decisions or actions of these systems are based, should reflect in their model identity certificate (MIC) the level of model relevance, truthfulness and transparency.</p><p><strong>Methods: </strong>Methods and tools for building models and their corresponding templates for a MIC in the domains of radiology and surgery should be drawn from relevant elements of a model science, specifically from mathematical modelling methods (e.g. for model truthfulness) and modelling informatics tools (e.g. for model transparency). Other elements or MIC classes to consider may include ethics, human-AI model interaction and model control.</p><p><strong>Results: </strong>A generic template of a MIC with classes, attributes and examples for the general domain of health care is being proposed as an initial attempt to gain experience with the complexity of the problems associated with enhancing trustworthiness in models. This template is intended to serve as a framework for an instance of a specific template for robot assisted intervention for hepatocellular cancer within the domain of interventional radiology (work-in-progress).</p><p><strong>Conclusion: </strong>Gaining trustworthiness in intelligent systems based on models and related AI tools is a challenging undertaking and raises many critical questions, specifically those related to ascertain model relevance, truthfulness and transparency. The healthcare system, in particular the interventional medical disciplines, will have to be concerned about the availability of digital identity certificates to enable the control for these systems and related artefacts, e.g. digital twins, avatars, diagnostic and interventional robots, or intelligent agents.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2191-2198"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}