Pub Date : 2025-11-01Epub Date: 2025-06-20DOI: 10.1007/s11548-025-03437-7
Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Ning Zhong, Zhen Li, Xiaoxiao Yang, Hongliang Ren
Purpose: The intricate nature of endoscopic surgical environments poses significant challenges for the task of dissection zone segmentation. Specifically, the boundaries between different tissue types lack clarity, which can result in significant segmentation errors, as the models may misidentify or overlook object edges altogether. Thus, the goal of this work is to achieve the precise dissection zone suggestion under these challenges during endoscopic submucosal dissection (ESD) procedures and enhance the overall safety of ESD.
Methods: We introduce a prompted-based dissection zone segmentation (PDZSeg) model, aimed at segmenting dissection zones and specifically designed to incorporate different visual prompts, such as scribbles and bounding boxes. Our approach overlays these visual cues directly onto the images, utilizing fine-tuning of the foundational model on a specialized dataset created to handle diverse visual prompt instructions. This shift toward more flexible input methods is intended to significantly improve both the performance of dissection zone segmentation and the overall user experience.
Results: We evaluate our approaches using the three experimental setups: in-domain evaluation, evaluation under variability in visual prompts availability, and robustness assessment. By validating our approaches on the ESD-DZSeg dataset, specifically focused on the dissection zone segmentation task of ESD, our experimental results show that our solution outperforms state-of-the-art segmentation methods for this task. To the best of our knowledge, this is the first study to incorporate visual prompt design in dissection zone segmentation.
Conclusion: We introduce the prompted-based dissection zone segmentation (PDZSeg) model, which is specifically designed for dissection zone segmentation and can effectively utilize various visual prompts, including scribbles and bounding boxes. This model improves segmentation performance and enhances user experience by integrating a specialized dataset with a novel visual referral method that optimizes the architecture and boosts the effectiveness of dissection zone suggestions. Furthermore, we present the ESD-DZSeg dataset for robot-assisted endoscopic submucosal dissection (ESD), which serves as a benchmark for assessing dissection zone suggestions and visual prompt interpretation, thus laying the groundwork for future research in this field. Our code is available at https://github.com/FrankMOWJ/PDZSeg .
{"title":"PDZSeg: adapting the foundation model for dissection zone segmentation with visual prompts in robot-assisted endoscopic submucosal dissection.","authors":"Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Ning Zhong, Zhen Li, Xiaoxiao Yang, Hongliang Ren","doi":"10.1007/s11548-025-03437-7","DOIUrl":"10.1007/s11548-025-03437-7","url":null,"abstract":"<p><strong>Purpose: </strong>The intricate nature of endoscopic surgical environments poses significant challenges for the task of dissection zone segmentation. Specifically, the boundaries between different tissue types lack clarity, which can result in significant segmentation errors, as the models may misidentify or overlook object edges altogether. Thus, the goal of this work is to achieve the precise dissection zone suggestion under these challenges during endoscopic submucosal dissection (ESD) procedures and enhance the overall safety of ESD.</p><p><strong>Methods: </strong>We introduce a prompted-based dissection zone segmentation (PDZSeg) model, aimed at segmenting dissection zones and specifically designed to incorporate different visual prompts, such as scribbles and bounding boxes. Our approach overlays these visual cues directly onto the images, utilizing fine-tuning of the foundational model on a specialized dataset created to handle diverse visual prompt instructions. This shift toward more flexible input methods is intended to significantly improve both the performance of dissection zone segmentation and the overall user experience.</p><p><strong>Results: </strong>We evaluate our approaches using the three experimental setups: in-domain evaluation, evaluation under variability in visual prompts availability, and robustness assessment. By validating our approaches on the ESD-DZSeg dataset, specifically focused on the dissection zone segmentation task of ESD, our experimental results show that our solution outperforms state-of-the-art segmentation methods for this task. To the best of our knowledge, this is the first study to incorporate visual prompt design in dissection zone segmentation.</p><p><strong>Conclusion: </strong>We introduce the prompted-based dissection zone segmentation (PDZSeg) model, which is specifically designed for dissection zone segmentation and can effectively utilize various visual prompts, including scribbles and bounding boxes. This model improves segmentation performance and enhances user experience by integrating a specialized dataset with a novel visual referral method that optimizes the architecture and boosts the effectiveness of dissection zone suggestions. Furthermore, we present the ESD-DZSeg dataset for robot-assisted endoscopic submucosal dissection (ESD), which serves as a benchmark for assessing dissection zone suggestions and visual prompt interpretation, thus laying the groundwork for future research in this field. Our code is available at https://github.com/FrankMOWJ/PDZSeg .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2335-2344"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575525/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-07DOI: 10.1007/s11548-025-03432-y
Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin
Purpose: Planning highly complex surgeries in virtual reality (VR) provides a user-friendly and natural way to navigate volumetric medical data and can improve the sense of depth and scale. Using ray marching-based volume rendering to display the data has several benefits over traditional mesh-based rendering, such as offering a more accurate and detailed visualization without the need for prior segmentation and meshing. However, volume rendering can be difficult to extend to support multiple intersecting volumes in a scene while maintaining a high enough update rate for a comfortable user experience in VR.
Methods: Upon loading a volume, a rough ad hoc segmentation is performed using a motion-tracked controller. The segmentation is not used to extract a surface mesh and does not need to precisely define the exact surfaces to be rendered, as it only serves to separate the volume into individual sub-volumes, which are rendered in multiple, consecutive volume rendering passes. For each pass, the ray lengths are written into the camera depth buffer at early ray termination and read in subsequent passes to ensure correct occlusion between individual volumes.
Results: We evaluate the performance of the multi-volume renderer using three different use cases and corresponding datasets. We show that the presented approach can avoid dropped frames at the typical update rate of 90 frames per second of a desktop-based VR system and, therefore, provide a comfortable user experience even in the presence of more than twenty individual volumes.
Conclusion: Our proof-of-concept implementation shows the feasibility of VR-based surgical planning systems, which require dynamic and direct manipulation of the original volumetric data without sacrificing rendering performance and user experience.
{"title":"Multi-volume rendering using depth buffers for surgical planning in virtual reality.","authors":"Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin","doi":"10.1007/s11548-025-03432-y","DOIUrl":"10.1007/s11548-025-03432-y","url":null,"abstract":"<p><strong>Purpose: </strong>Planning highly complex surgeries in virtual reality (VR) provides a user-friendly and natural way to navigate volumetric medical data and can improve the sense of depth and scale. Using ray marching-based volume rendering to display the data has several benefits over traditional mesh-based rendering, such as offering a more accurate and detailed visualization without the need for prior segmentation and meshing. However, volume rendering can be difficult to extend to support multiple intersecting volumes in a scene while maintaining a high enough update rate for a comfortable user experience in VR.</p><p><strong>Methods: </strong>Upon loading a volume, a rough ad hoc segmentation is performed using a motion-tracked controller. The segmentation is not used to extract a surface mesh and does not need to precisely define the exact surfaces to be rendered, as it only serves to separate the volume into individual sub-volumes, which are rendered in multiple, consecutive volume rendering passes. For each pass, the ray lengths are written into the camera depth buffer at early ray termination and read in subsequent passes to ensure correct occlusion between individual volumes.</p><p><strong>Results: </strong>We evaluate the performance of the multi-volume renderer using three different use cases and corresponding datasets. We show that the presented approach can avoid dropped frames at the typical update rate of 90 frames per second of a desktop-based VR system and, therefore, provide a comfortable user experience even in the presence of more than twenty individual volumes.</p><p><strong>Conclusion: </strong>Our proof-of-concept implementation shows the feasibility of VR-based surgical planning systems, which require dynamic and direct manipulation of the original volumetric data without sacrificing rendering performance and user experience.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2251-2258"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-07-01DOI: 10.1007/s11548-025-03455-5
Karin A Olthof, Matteo Fusaglia, Anne G den Hartog, Niels F M Kok, Theo J M Ruers, Koert F D Kuhlmann
Purpose: Understanding patient-specific liver anatomy is crucial for patient safety and achieving complete treatment of all tumors during surgery. This study evaluates the impact of the use of patient-specific 3D liver models and surgical navigation on procedural complexity in open liver surgery.
Methods: Patients with colorectal liver metastases scheduled for open liver surgery were included between June 2022 and October 2023 at the Netherlands Cancer Institute. Patient-specific 3D liver models could be used upon request during the surgical procedure. Subsequently, surgeons could request additional surgical navigation by landmark registration using an electromagnetically tracked ultrasound transducer. Postoperatively, surgeons assessed the impact of the use of the model and navigation on procedural complexity on a scale from 1 to 10.
Results: 35 patients were included in this study, with a median number of 8 (ranging from 3 to 25) tumors. 3D models were utilized in all procedures. Additional navigation was requested in 21/35 of patients to improve intraoperative planning and tumor localization. The mean procedural complexity score with navigation was 4.3 (95% CI [3.7, 5.0]), compared to 7.8 (95% CI [6.6, 9.0]) with the 3D model alone. Both visualization methods improved lesion localization and provided better anatomical insight.
Conclusion: 3D models and surgical navigation significantly reduce the complexity of open liver surgery, especially in patients with bilobar disease. These tools enhance intraoperative decision-making and may lead to better surgical outcomes. The stepwise implementation of the visualization techniques in this study underscores the added benefit of surgical navigation beyond 3D modeling alone, supporting its potential for broader clinical implementation.
{"title":"The impact of 3-dimensional models and surgical navigation for open liver surgery.","authors":"Karin A Olthof, Matteo Fusaglia, Anne G den Hartog, Niels F M Kok, Theo J M Ruers, Koert F D Kuhlmann","doi":"10.1007/s11548-025-03455-5","DOIUrl":"10.1007/s11548-025-03455-5","url":null,"abstract":"<p><strong>Purpose: </strong>Understanding patient-specific liver anatomy is crucial for patient safety and achieving complete treatment of all tumors during surgery. This study evaluates the impact of the use of patient-specific 3D liver models and surgical navigation on procedural complexity in open liver surgery.</p><p><strong>Methods: </strong>Patients with colorectal liver metastases scheduled for open liver surgery were included between June 2022 and October 2023 at the Netherlands Cancer Institute. Patient-specific 3D liver models could be used upon request during the surgical procedure. Subsequently, surgeons could request additional surgical navigation by landmark registration using an electromagnetically tracked ultrasound transducer. Postoperatively, surgeons assessed the impact of the use of the model and navigation on procedural complexity on a scale from 1 to 10.</p><p><strong>Results: </strong>35 patients were included in this study, with a median number of 8 (ranging from 3 to 25) tumors. 3D models were utilized in all procedures. Additional navigation was requested in 21/35 of patients to improve intraoperative planning and tumor localization. The mean procedural complexity score with navigation was 4.3 (95% CI [3.7, 5.0]), compared to 7.8 (95% CI [6.6, 9.0]) with the 3D model alone. Both visualization methods improved lesion localization and provided better anatomical insight.</p><p><strong>Conclusion: </strong>3D models and surgical navigation significantly reduce the complexity of open liver surgery, especially in patients with bilobar disease. These tools enhance intraoperative decision-making and may lead to better surgical outcomes. The stepwise implementation of the visualization techniques in this study underscores the added benefit of surgical navigation beyond 3D modeling alone, supporting its potential for broader clinical implementation.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2213-2218"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575497/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144546069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-04-25DOI: 10.1007/s11548-025-03366-5
Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden
Objective: Video offers an accessible method for automated surgical skill evaluation; however, many platforms still rely on traditional six-degree-of-freedom (6-DOF) tracking systems, which can be costly, cumbersome, and challenging to apply clinically. This study aims to demonstrate that trainee skill in cataract surgery can be assessed effectively using only object detection from monocular surgical microscope video.
Methods: One ophthalmologist and four residents performed cataract surgery on a simulated eye five times each, generating 25 recordings. Recordings included both the surgical microscope video and 6-DOF instrument tracking data. Videos were graded by two expert ophthalmologists using the ICO-OSCAR:SICS rubric. We computed motion-based metrics using both object detection from video and 6-DOF tracking. We first examined correlations between each metric and expert scores for each rubric criteria. Then, using these findings, we trained an ordinal regression model to predict scores from each tracking modality and compared correlation strengths with expert scores.
Results: Metrics from object detection generally showed stronger correlations with expert scores than 6-DOF tracking. For score prediction, 6-DOF tracking showed no significant advantage, while scores predicted from object detection achieved significantly stronger correlations with expert scores for four scoring criteria.
Conclusion: Our results indicate that skill assessment from monocular surgical microscope video can match, and in some cases exceed, the correlation strengths of 6-DOF tracking assessments. This finding supports the feasibility of using object detection for skill assessment without additional hardware.
{"title":"Objective skill assessment for cataract surgery from surgical microscope video.","authors":"Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden","doi":"10.1007/s11548-025-03366-5","DOIUrl":"10.1007/s11548-025-03366-5","url":null,"abstract":"<p><strong>Objective: </strong>Video offers an accessible method for automated surgical skill evaluation; however, many platforms still rely on traditional six-degree-of-freedom (6-DOF) tracking systems, which can be costly, cumbersome, and challenging to apply clinically. This study aims to demonstrate that trainee skill in cataract surgery can be assessed effectively using only object detection from monocular surgical microscope video.</p><p><strong>Methods: </strong>One ophthalmologist and four residents performed cataract surgery on a simulated eye five times each, generating 25 recordings. Recordings included both the surgical microscope video and 6-DOF instrument tracking data. Videos were graded by two expert ophthalmologists using the ICO-OSCAR:SICS rubric. We computed motion-based metrics using both object detection from video and 6-DOF tracking. We first examined correlations between each metric and expert scores for each rubric criteria. Then, using these findings, we trained an ordinal regression model to predict scores from each tracking modality and compared correlation strengths with expert scores.</p><p><strong>Results: </strong>Metrics from object detection generally showed stronger correlations with expert scores than 6-DOF tracking. For score prediction, 6-DOF tracking showed no significant advantage, while scores predicted from object detection achieved significantly stronger correlations with expert scores for four scoring criteria.</p><p><strong>Conclusion: </strong>Our results indicate that skill assessment from monocular surgical microscope video can match, and in some cases exceed, the correlation strengths of 6-DOF tracking assessments. This finding supports the feasibility of using object detection for skill assessment without additional hardware.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2219-2230"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-30DOI: 10.1007/s11548-025-03426-w
Ping-Cheng Ku, Mingxu Liu, Robert Grupp, Andrew Harris, Julius K Oni, Simon C Mears, Alejandro Martin-Gomez, Mehran Armand
Purpose: Soft tissue pathologies and bone defects are not easily visible in intra-operative fluoroscopic images; therefore, we develop an end-to-end MRI-to-fluoroscopic image registration framework, aiming to enhance intra-operative visualization for surgeons during orthopedic procedures.
Methods: The proposed framework utilizes deep learning to segment MRI scans and generate synthetic CT (sCT) volumes. These sCT volumes are then used to produce digitally reconstructed radiographs (DRRs), enabling 2D/3D registration with intra-operative fluoroscopic images. The framework's performance was validated through simulation and cadaver studies for core decompression (CD) surgery, focusing on the registration accuracy of femur and pelvic regions.
Results: The framework achieved a mean translational registration accuracy of 2.4 ± 1.0 mm and rotational accuracy of 1.6 ± for the femoral region in cadaver studies. The method successfully enabled intra-operative visualization of necrotic lesions that were not visible on conventional fluoroscopic images, marking a significant advancement in image guidance for femur and pelvic surgeries.
Conclusion: The MRI-to-fluoroscopic registration framework offers a novel approach to image guidance in orthopedic surgeries, exclusively using MRI without the need for CT scans. This approach enhances the visualization of soft tissues and bone defects, reduces radiation exposure, and provides a safer, more effective alternative for intra-operative surgical guidance.
{"title":"End-to-end 2D/3D registration from pre-operative MRI to intra-operative fluoroscopy for orthopedic procedures.","authors":"Ping-Cheng Ku, Mingxu Liu, Robert Grupp, Andrew Harris, Julius K Oni, Simon C Mears, Alejandro Martin-Gomez, Mehran Armand","doi":"10.1007/s11548-025-03426-w","DOIUrl":"10.1007/s11548-025-03426-w","url":null,"abstract":"<p><strong>Purpose: </strong>Soft tissue pathologies and bone defects are not easily visible in intra-operative fluoroscopic images; therefore, we develop an end-to-end MRI-to-fluoroscopic image registration framework, aiming to enhance intra-operative visualization for surgeons during orthopedic procedures.</p><p><strong>Methods: </strong>The proposed framework utilizes deep learning to segment MRI scans and generate synthetic CT (sCT) volumes. These sCT volumes are then used to produce digitally reconstructed radiographs (DRRs), enabling 2D/3D registration with intra-operative fluoroscopic images. The framework's performance was validated through simulation and cadaver studies for core decompression (CD) surgery, focusing on the registration accuracy of femur and pelvic regions.</p><p><strong>Results: </strong>The framework achieved a mean translational registration accuracy of 2.4 ± 1.0 mm and rotational accuracy of 1.6 ± <math><mrow><mn>0</mn> <mo>.</mo> <msup><mn>8</mn> <mo>∘</mo></msup> </mrow> </math> for the femoral region in cadaver studies. The method successfully enabled intra-operative visualization of necrotic lesions that were not visible on conventional fluoroscopic images, marking a significant advancement in image guidance for femur and pelvic surgeries.</p><p><strong>Conclusion: </strong>The MRI-to-fluoroscopic registration framework offers a novel approach to image guidance in orthopedic surgeries, exclusively using MRI without the need for CT scans. This approach enhances the visualization of soft tissues and bone defects, reduces radiation exposure, and provides a safer, more effective alternative for intra-operative surgical guidance.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2355-2366"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-19DOI: 10.1007/s11548-025-03447-5
Alberto Neri, Maximilan Fehrentz, Veronica Penza, Leonardo S Mattos, Nazim Haouchine
Purpose: Neural radiance fields (NeRF) offer exceptional capabilities for 3D reconstruction and view synthesis, yet their reliance on extensive multi-view data limits their application in surgical intraoperative settings where only limited data are available. This work addresses this challenge by leveraging a single intraoperative image and preoperative data to train NeRF efficiently for surgical scenarios.
Methods: We leverage preoperative MRI data to define the set of camera viewpoints and images needed for robust and unobstructed training. Intraoperatively, the appearance of the surgical image is transferred to the pre-constructed training set through neural style transfer, specifically combining WTC2 and STROTSS to prevent over-stylization. This process enables the creation of a dataset for instant and fast single-image NeRF training.
Results: The method is evaluated with four clinical neurosurgical cases. Quantitative comparisons to NeRF models trained on real surgical microscope images demonstrate strong synthesis agreement, with similarity metrics indicating high reconstruction fidelity and stylistic alignment. When compared with ground truth, our method demonstrates high structural similarity, confirming good reconstruction quality and texture preservation.
Conclusion: Our approach demonstrates the feasibility of single-image NeRF training in surgical settings, overcoming the limitations of traditional multi-view methods. By eliminating the dependency on a large multi-view dataset, our method offers a faster, more adaptable solution for generating accurate 3D reconstructions in real-time surgical scenarios.
{"title":"Surgical neural radiance fields from one image.","authors":"Alberto Neri, Maximilan Fehrentz, Veronica Penza, Leonardo S Mattos, Nazim Haouchine","doi":"10.1007/s11548-025-03447-5","DOIUrl":"10.1007/s11548-025-03447-5","url":null,"abstract":"<p><strong>Purpose: </strong>Neural radiance fields (NeRF) offer exceptional capabilities for 3D reconstruction and view synthesis, yet their reliance on extensive multi-view data limits their application in surgical intraoperative settings where only limited data are available. This work addresses this challenge by leveraging a single intraoperative image and preoperative data to train NeRF efficiently for surgical scenarios.</p><p><strong>Methods: </strong>We leverage preoperative MRI data to define the set of camera viewpoints and images needed for robust and unobstructed training. Intraoperatively, the appearance of the surgical image is transferred to the pre-constructed training set through neural style transfer, specifically combining WTC<sup>2</sup> and STROTSS to prevent over-stylization. This process enables the creation of a dataset for instant and fast single-image NeRF training.</p><p><strong>Results: </strong>The method is evaluated with four clinical neurosurgical cases. Quantitative comparisons to NeRF models trained on real surgical microscope images demonstrate strong synthesis agreement, with similarity metrics indicating high reconstruction fidelity and stylistic alignment. When compared with ground truth, our method demonstrates high structural similarity, confirming good reconstruction quality and texture preservation.</p><p><strong>Conclusion: </strong>Our approach demonstrates the feasibility of single-image NeRF training in surgical settings, overcoming the limitations of traditional multi-view methods. By eliminating the dependency on a large multi-view dataset, our method offers a faster, more adaptable solution for generating accurate 3D reconstructions in real-time surgical scenarios.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2329-2333"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-01DOI: 10.1007/s11548-025-03379-0
Wenhao Gu, Justin D Opfermann, Jonathan Knopf, Axel Krieger, Mathias Unberath
Purpose: Mixed reality for surgical navigation is an emerging tool for precision surgery. Achieving reliable surgical guidance hinges on robust tracking of the mixed reality device relative to patient anatomy. Contemporary approaches either introduce bulky fiducials that need to be invasively attached to the anatomy or make strong assumptions about the patient remaining stationary.
Methods: We present an approach to anatomy tracking that relies on biocompatible near-infrared fluorescent (NIRF) dots. Dots are quickly placed on the anatomy intra-operatively and the pose is tracked reliably via PnP-type methods. We demonstrate the potential of our NIRF dots approach to track patient movements after image registration on a 3D printed model, simulating an image-guided navigation process with a tablet-based mixed reality scenario.
Results: The dot-based pose tracking demonstrated an average accuracy of 1.13 mm in translation and 0.69 degrees in rotation under static conditions, and 1.39 mm/1.10 degrees, respectively, under dynamic conditions.
Conclusion: Our results are promising and encourage further research in the viability of integrating NIRF dots in mixed reality surgical navigation. These biocompatible dots may allow for reliable tracking of patient motion post-registration, providing a convenient alternative to invasive marker arrays. While our initial tests used a tablet, adaptation to head-mounted displays is plausible with suitable sensors.
{"title":"Near-infrared beacons: tracking anatomy with biocompatible fluorescent dots for mixed reality surgical navigation.","authors":"Wenhao Gu, Justin D Opfermann, Jonathan Knopf, Axel Krieger, Mathias Unberath","doi":"10.1007/s11548-025-03379-0","DOIUrl":"10.1007/s11548-025-03379-0","url":null,"abstract":"<p><strong>Purpose: </strong>Mixed reality for surgical navigation is an emerging tool for precision surgery. Achieving reliable surgical guidance hinges on robust tracking of the mixed reality device relative to patient anatomy. Contemporary approaches either introduce bulky fiducials that need to be invasively attached to the anatomy or make strong assumptions about the patient remaining stationary.</p><p><strong>Methods: </strong>We present an approach to anatomy tracking that relies on biocompatible near-infrared fluorescent (NIRF) dots. Dots are quickly placed on the anatomy intra-operatively and the pose is tracked reliably via PnP-type methods. We demonstrate the potential of our NIRF dots approach to track patient movements after image registration on a 3D printed model, simulating an image-guided navigation process with a tablet-based mixed reality scenario.</p><p><strong>Results: </strong>The dot-based pose tracking demonstrated an average accuracy of 1.13 mm in translation and 0.69 degrees in rotation under static conditions, and 1.39 mm/1.10 degrees, respectively, under dynamic conditions.</p><p><strong>Conclusion: </strong>Our results are promising and encourage further research in the viability of integrating NIRF dots in mixed reality surgical navigation. These biocompatible dots may allow for reliable tracking of patient motion post-registration, providing a convenient alternative to invasive marker arrays. While our initial tests used a tablet, adaptation to head-mounted displays is plausible with suitable sensors.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2309-2318"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144063150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-05-26DOI: 10.1007/s11548-025-03422-0
Lunchi Guo, Dennis Trujillo, James R Duncan, M Allan Thomas
Purpose: Accurate patient dosimetry estimates from fluoroscopically-guided interventions (FGIs) are hindered by limited knowledge of the specific anatomy that was irradiated. Current methods use data reported by the equipment to estimate the patient anatomy exposed during each irradiation event. We propose a deep learning algorithm to automatically match 2D fluoroscopic images with corresponding anatomical regions in computational phantoms, enabling more precise patient dose estimates.
Methods: Our method involves two main steps: (1) simulating 2D fluoroscopic images, and (2) developing a deep learning algorithm to predict anatomical coordinates from these images. For part (1), we utilized DeepDRR for fast and realistic simulation of 2D x-ray images from 3D computed tomography datasets. We generated a diverse set of simulated fluoroscopic images from various regions with different field sizes. In part (2), we employed a Residual Neural Network (ResNet) architecture combined with metadata processing to effectively integrate patient-specific information (age and gender) to learn the transformation between 2D images and specific anatomical coordinates in each representative phantom. For the Modified ResNet model, we defined an allowable error range of ± 10 mm.
Results: The proposed method achieved over 90% of predictions within ± 10 mm, with strong alignment between predicted and true coordinates as confirmed by Bland-Altman analysis. Most errors were within ± 2%, with outliers beyond ± 5% primarily in Z-coordinates for infant phantoms due to their limited representation in the training data. These findings highlight the model's accuracy and its potential for precise spatial localization, while emphasizing the need for improved performance in specific anatomical regions.
Conclusion: In this work, a comprehensive simulated 2D fluoroscopy image dataset was developed, addressing the scarcity of real clinical datasets and enabling effective training of deep-learning models. The modified ResNet successfully achieved precise prediction of anatomical coordinates from the simulated fluoroscopic images, enabling the goal of more accurate patient-specific dosimetry.
{"title":"Training a deep learning model to predict the anatomy irradiated in fluoroscopic x-ray images.","authors":"Lunchi Guo, Dennis Trujillo, James R Duncan, M Allan Thomas","doi":"10.1007/s11548-025-03422-0","DOIUrl":"10.1007/s11548-025-03422-0","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate patient dosimetry estimates from fluoroscopically-guided interventions (FGIs) are hindered by limited knowledge of the specific anatomy that was irradiated. Current methods use data reported by the equipment to estimate the patient anatomy exposed during each irradiation event. We propose a deep learning algorithm to automatically match 2D fluoroscopic images with corresponding anatomical regions in computational phantoms, enabling more precise patient dose estimates.</p><p><strong>Methods: </strong>Our method involves two main steps: (1) simulating 2D fluoroscopic images, and (2) developing a deep learning algorithm to predict anatomical coordinates from these images. For part (1), we utilized DeepDRR for fast and realistic simulation of 2D x-ray images from 3D computed tomography datasets. We generated a diverse set of simulated fluoroscopic images from various regions with different field sizes. In part (2), we employed a Residual Neural Network (ResNet) architecture combined with metadata processing to effectively integrate patient-specific information (age and gender) to learn the transformation between 2D images and specific anatomical coordinates in each representative phantom. For the Modified ResNet model, we defined an allowable error range of ± 10 mm.</p><p><strong>Results: </strong>The proposed method achieved over 90% of predictions within ± 10 mm, with strong alignment between predicted and true coordinates as confirmed by Bland-Altman analysis. Most errors were within ± 2%, with outliers beyond ± 5% primarily in Z-coordinates for infant phantoms due to their limited representation in the training data. These findings highlight the model's accuracy and its potential for precise spatial localization, while emphasizing the need for improved performance in specific anatomical regions.</p><p><strong>Conclusion: </strong>In this work, a comprehensive simulated 2D fluoroscopy image dataset was developed, addressing the scarcity of real clinical datasets and enabling effective training of deep-learning models. The modified ResNet successfully achieved precise prediction of anatomical coordinates from the simulated fluoroscopic images, enabling the goal of more accurate patient-specific dosimetry.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2345-2353"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144144393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Augmented reality (AR) technology holds significant promise for enhancing surgical navigation in needle-based procedures such as biopsies and ablations. However, most existing AR systems rely on patient-specific markers, which disrupt clinical workflows and require time-consuming preoperative calibrations, thereby hindering operational efficiency and precision.
Methods: We developed a novel multi-camera AR navigation system that eliminates the need for patient-specific markers by utilizing ceiling-mounted markers mapped to fixed medical imaging devices. A hierarchical optimization framework integrates both marker mapping and multi-camera calibration. Deep learning techniques are employed to enhance marker detection and registration accuracy. Additionally, a vision-based pose compensation method is implemented to mitigate errors caused by patient movement, improving overall positional accuracy.
Results: Validation through phantom experiments and simulated clinical scenarios demonstrated an average puncture accuracy of 3.72 ± 1.21 mm. The system reduced needle placement time by 20 s compared to traditional marker-based methods. It also effectively corrected errors induced by patient movement, with a mean positional error of 0.38 pixels and an angular deviation of 0.51 . These results highlight the system's precision, adaptability, and reliability in realistic surgical conditions.
Conclusion: This marker-free AR guidance system significantly streamlines surgical workflows while enhancing needle navigation accuracy. Its simplicity, cost-effectiveness, and adaptability make it an ideal solution for both high- and low-resource clinical environments, offering the potential for improved precision, reduced procedural time, and better patient outcomes.
{"title":"Efficient needle guidance: multi-camera augmented reality navigation without patient-specific calibration.","authors":"Yizhi Wei, Bingyu Huang, Bolin Zhao, Zhengyu Lin, Steven Zhiying Zhou","doi":"10.1007/s11548-025-03477-z","DOIUrl":"10.1007/s11548-025-03477-z","url":null,"abstract":"<p><strong>Purpose: </strong>Augmented reality (AR) technology holds significant promise for enhancing surgical navigation in needle-based procedures such as biopsies and ablations. However, most existing AR systems rely on patient-specific markers, which disrupt clinical workflows and require time-consuming preoperative calibrations, thereby hindering operational efficiency and precision.</p><p><strong>Methods: </strong>We developed a novel multi-camera AR navigation system that eliminates the need for patient-specific markers by utilizing ceiling-mounted markers mapped to fixed medical imaging devices. A hierarchical optimization framework integrates both marker mapping and multi-camera calibration. Deep learning techniques are employed to enhance marker detection and registration accuracy. Additionally, a vision-based pose compensation method is implemented to mitigate errors caused by patient movement, improving overall positional accuracy.</p><p><strong>Results: </strong>Validation through phantom experiments and simulated clinical scenarios demonstrated an average puncture accuracy of 3.72 ± 1.21 mm. The system reduced needle placement time by 20 s compared to traditional marker-based methods. It also effectively corrected errors induced by patient movement, with a mean positional error of 0.38 pixels and an angular deviation of 0.51 <math><mmultiscripts><mrow></mrow> <mrow></mrow> <mo>∘</mo></mmultiscripts> </math> . These results highlight the system's precision, adaptability, and reliability in realistic surgical conditions.</p><p><strong>Conclusion: </strong>This marker-free AR guidance system significantly streamlines surgical workflows while enhancing needle navigation accuracy. Its simplicity, cost-effectiveness, and adaptability make it an ideal solution for both high- and low-resource clinical environments, offering the potential for improved precision, reduced procedural time, and better patient outcomes.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2281-2291"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144621053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-09-18DOI: 10.1007/s11548-025-03479-x
Xinfeng Xiao, Shijun Li, Wei Yu
Purpose: Depression is a psychological disorder that has vital implications for society's health. So, it is important to develop a model that aids in effective and accurate depression diagnosis. This paper proposes a Dynamic Convolutional Encoder Model based on a Temporal Circular Residual Convolutional Network (DCEM-TCRCN), a novel approach for diagnosing depression using wearable Internet-of-Things sensors.
Methods: DCEM integrates Mobile Inverted Bottleneck Convolution (MBConv) blocks with Dynamic Convolution (DConv) to maximize feature extraction and allow the system to react to input changes and effectively extract depression-correlated patterns. The TCRCN model improves the performance using circular dilated convolution to address long-range temporal relations and eliminate boundary effects. Temporal attention mechanisms deal with important patterns in the data, while weight normalization, GELU activation, and dropout assure stability, regularization, and convergence.
Results: The proposed system applies physiological information acquired from wearable sensors, including heart rate variability and electrodermal activity. Preprocessing tasks like one-hot encoding and data normalization normalize inputs to enable successful feature extraction. Dual fully connected layers perform classifications using pooled learned representations to make accurate predictions regarding depression states.
Conclusion: Experimental analysis on the Depression Dataset confirmed the improved performance of the DCEM-TCRCN model with an accuracy of 98.88%, precision of 97.76%, recall of 98.21%, and a Cohen-Kappa score of 97.99%. The findings confirm the efficacy, trustworthiness, and stability of the model, making it usable for real-time psychological health monitoring.
{"title":"DCEM-TCRCN: an innovative approach to depression detection using wearable IoT devices and deep learning.","authors":"Xinfeng Xiao, Shijun Li, Wei Yu","doi":"10.1007/s11548-025-03479-x","DOIUrl":"10.1007/s11548-025-03479-x","url":null,"abstract":"<p><strong>Purpose: </strong>Depression is a psychological disorder that has vital implications for society's health. So, it is important to develop a model that aids in effective and accurate depression diagnosis. This paper proposes a Dynamic Convolutional Encoder Model based on a Temporal Circular Residual Convolutional Network (DCEM-TCRCN), a novel approach for diagnosing depression using wearable Internet-of-Things sensors.</p><p><strong>Methods: </strong>DCEM integrates Mobile Inverted Bottleneck Convolution (MBConv) blocks with Dynamic Convolution (DConv) to maximize feature extraction and allow the system to react to input changes and effectively extract depression-correlated patterns. The TCRCN model improves the performance using circular dilated convolution to address long-range temporal relations and eliminate boundary effects. Temporal attention mechanisms deal with important patterns in the data, while weight normalization, GELU activation, and dropout assure stability, regularization, and convergence.</p><p><strong>Results: </strong>The proposed system applies physiological information acquired from wearable sensors, including heart rate variability and electrodermal activity. Preprocessing tasks like one-hot encoding and data normalization normalize inputs to enable successful feature extraction. Dual fully connected layers perform classifications using pooled learned representations to make accurate predictions regarding depression states.</p><p><strong>Conclusion: </strong>Experimental analysis on the Depression Dataset confirmed the improved performance of the DCEM-TCRCN model with an accuracy of 98.88%, precision of 97.76%, recall of 98.21%, and a Cohen-Kappa score of 97.99%. The findings confirm the efficacy, trustworthiness, and stability of the model, making it usable for real-time psychological health monitoring.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2301-2308"},"PeriodicalIF":2.3,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}